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Introduction 


Over the last 30 years, we have worked to develop a theory and 
supporting evidence to account for the evolution of the human capacity for 
culture and how this capacity leads to distinctive evolutionary patterns. Much of 
our early work is summarized in our book Culture and the Evolutionary Process, 
published in 1985. Since that time we have published numerous articles that 
expand the theory and discuss relevant data. We think that these articles fit to- 
gether to tell a consistent story about how the capacity for culture evolved in the 
human lineage and why it has led to evolutionarily novel outcomes like large-scale 
cooperation. However, because this work is relevant to scholars in disciplines 
ranging from evolutionary biology to archaeology to economics, these essays are 
scattered among an equally wide range of journals. As a result, the overall story 
is not so easy to discern. So when Steve Stich suggested that we might bring a 
sampling of this work together in a single volume of his Evolution and Cognition 
series, we jumped at the chance. 

Our research program can be summarized in five propositions: 

1 . Culture is information that people acquire from others by teaching, 
imitation, and other forms of social learning. On a scale unknown in 
any other species, people acquire skills, beliefs, and values from the 
people around them, and these strongly affect behavior. People 
living in human populations are heirs to a pool of socially trans- 
mitted information that affects how they make a living, how they 
communicate, and what they think is right and wrong. The infor- 
mation thus stored and transmitted varies from individual to indi- 
vidual and is a property of the population only in a statistical sense. 
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2. Culture change should be modeled as a Darwinian evolutionary pro- 
cess. Culture changes as some ideas and values or “cultural variants,” 
become more common and others diminish. A theory of culture 
must account for the processes in the everyday lives of individuals 
that cause such changes. Some of these processes arise from human 
psychology because some ideas are more readily learned or remem- 
bered. Other processes are social and ecological. Some ideas make 
people richer, live longer, or migrate more often, and the resulting 
selective processes generate culture change. While making frequent 
use of ideas and mathematical tools from population biology in 
modeling such culture change, ultimately the theory must derive 
from the empirical facts of how culture is stored and transmitted. 

3. Culture is part of human biology. The capacities that allow us to 
acquire culture are evolved components of human psychology, and 
the contents of cultures are deeply intertwined with many aspects of 
our biology. What we learn, what we feel, how we think, and how 
we remember are all shaped by the architecture of human minds 
and bodies shaped over the millennia by the ongoing action of or- 
ganic evolution. As a result, much cultural variation can be under- 
stood in terms of human evolutionary history. 

4. Culture makes human evolution very different from the evolution of 
other organisms. Humans, unlike any other living creature, have 
cumulative cultural adaptation. Humans learn things from others, 
improve those things, transmit them to the next generation, where 
they are improved again, and so on, leading to the rapid cultural 
evolution of superbly designed adaptations to particular environ- 
ments. This ability has allowed human populations to become the 
most widespread and variable species on earth. At the same time, 
because cumulative cultural evolution makes available ideas that no 
individual could discover and technology that no individual could 
invent, it requires a degree of credulity. While individuals are not 
passive receptacles of their culture, they cannot vet every belief and 
value their culture makes available, and this opens the door to the 
spread of “maladaptive” ideas, ideas that would never evolve in a 
noncultural organism. Moreover, the fact that much culture is ac- 
quired from people other than parents means that such maladaptive 
ideas tend to accumulate. 

5. Genes and culture coevolve. Because culture creates durable changes 
in human behavior, human genes evolve in a culturally constructed 
environment. This environment, in turn, generates selection on 
genes. The evolution of language is an example. We apparently have 
a complex innate system for hearing, speaking, and learning language. 
This capacity would likely be useless without complex languages 
to learn. Primitive languages presumably created a cultural world 
in which better innate language skills were favored by selection. 
Through repeated rounds of coevolution, complex languages and the 
costly apparatus necessary to operate them emerged. Such effects are 
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probably pervasive. The existence of complex technology depends 
upon great facility in observational learning, and complex social in- 
stitutions depend on people being adept at learning the rules of so- 
cial games. Our ape relations can learn only rudimentary bits of 
language and rudimentary technical and social skills. They have only 
rudimentary cultural traditions of any kind. Most of what human 
organic evolution has been about is the coevolution of capacities for 
culture and cultural traditions. 

The first two propositions have to do with how culture works, and the last three 
have to do with how cultural evolution interacts with genetic evolution. 

Both of us have a background in biology, and our first work was published 
during the heat of the sociobiology controversy, so you might think, as many do, 
that our work arose from an interest in culture and genetic evolution. However, the 
truth is that our entree to the subject came from trying to understand how cultural 
evolution worked to generate human behavior, especially behavior affecting the 
environment. Our collaboration began in 1974 when we co-taught Environmental 
Studies 10, a survey of environmental studies for nonmajors at U. C. Davis. At that 
time Pete was an assistant professor in Environmental Studies and Rob was a 
finishing graduate student in the Ecology Graduate Group. ES 10 was typically 
organized around a series of environmental problems — the population explosion, 
resource depletion, air and water pollution, and so on. We had the idea of orga- 
nizing it around the principle that individual, goal-seeking behavior sometimes led 
to outcomes that were bad for everyone and in this way bringing together ideas 
from ecology and economics. However, we also wanted to discuss human impacts 
on the environment in ancient and contemporary nonindustrial societies, so this 
meant going beyond economics. We knew that one of the then-dominant schools in 
anthropology, cultural ecology, held that much cultural variation could be un- 
derstood as adaptations to local environments, so this didn’t seem like it would be 
much of a problem. Such is the way of young men. 

When we actually sat down to learn what the social sciences had to say about 
culture, and how cultures adjusted to their environment, we were frustrated and 
disappointed. Cultural ecologists provided lots of interesting empirical examples 
of how behavioral variation could be understood as adaptations to environmental 
differences. However, there was little discussion, and no consensus, about how 
such adaptation occurred. To make matters worse, prominent authors like Marvin 
Harris explained some behaviors in terms of their function at the group level (the 
male supremacist complex in the Amazon conserved game) and others in terms 
of individual advantage (Indians do not eat cows because they are more useful 
for traction) . Since environmental problems often arise because the interests of 
individuals and groups conflict, we found this more than disconcerting. Other 
social scientists, symbolic anthropologists, social anthropologists, and many so- 
ciologists, refused to explain culture in terms of individual decisions and char- 
acteristics as a matter of principle. (A distinguished sociologist once astounded us 
with the claim that it had been proven that it was impossible to do so.) 

Of course, the rational actor model that predominates in economics and 
political science provides a very clear picture of how aggregate behavior arises 
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from individual choices. Human actors are assumed to come equipped with pref- 
erences that describe how they rank outcomes and beliefs that express what they 
think is the connection between their actions and outcomes. Behavior emerges as 
people rationally choose the actions that produce the best mix of outcomes. 
Variation between groups of people arises because different groups face different 
conditions. The problem is that rational actor theorists do not offer an account of 
where the preferences and beliefs come from. Scholars working in such tradi- 
tions usually don’t deny that culture is real and important but maintain that 
worrying about how it gives rise to preferences and beliefs is just not part of their 
job description. 

Darwinian students of human behavior proposed to rectify the lack of a 
theory of preferences and beliefs with evolutionary theory. Organisms should 
prefer to maximize their genetic fitness, or rather prefer and believe things that 
would have led to fitness maximization in the past. This is a strong theory, and 
certainly part of the answer. Darwinians, like economists, do not usually deny 
that culture plays a role in the formation of preferences and beliefs. But, like 
economists, they seldom enter terms representing culture into their models or 
collect much data about cultural variation. 

This benign neglect of culture is usually accompanied by a largely unartic- 
ulated prejudice against cultural explanations. Confronted with differences in 
marriage systems, inheritance rules, or economic organization, such scholars pre- 
fer almost any economic or ecological explanation, no matter how far-fetched, 
over explanations that invoke cultural history. From table talk we gather that one 
reason is that those students of human behavior who aspire to "hard” scientific 
explanations are reacting to the "soft” methods of the historians, anthropologists, 
and sociologists who frequently propose cultural explanations. Blaming the mes- 
sengers, if such is the case, seems to us unwise. 

We think the way to make cultural explanations “hard” enough to enter 
into principled debates is to use Darwinian methods to analyze cultural evolu- 
tion. Think of culture as a pool of information, mainly stored in the brains of a 
population of people. This information gets transmitted from one brain to an- 
other by various social learning processes. We define culture as follows: Culture 
is information capable of affecting individuals' behavior that they acquire from other 
members of their species by teaching, imitation, and other forms of social transmis- 
sion. By "information,” we mean any individual attribute that is acquired or 
modified by social learning and affects behavior. Most culture is mental states, 
but not all. Think of the blacksmith's proverbial muscular arms or the model’s 
waif-like figure — essential parts of their crafts. We often use everyday words like 
idea, knowledge, belief, value, skill, and attitude to describe this information, but 
we do not mean that such socially acquired information is always consciously 
available, or that it corresponds to folk-psychological categories. People in cul- 
turally distinct groups behave differently mostly because they have acquired dif- 
ferent beliefs, preferences, and skills, and these differences persist through time 
because the people of one generation acquire their beliefs and attitudes from 
those around them. 

To understand how cultures change, we set up an accounting system that 
describes how cultural variants are distributed in the population and how various 
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processes, some psychological, others social and ecological, cause some variants 
to spread and others to decline. The processes that cause such cultural change 
arise in the everyday lives of individuals as people acquire and use cultural in- 
formation. Some values are more appealing and thus more likely to spread from 
one individual to another. These will tend to persist while less attractive alter- 
natives tend to disappear. Some skills are easy to learn accurately while others 
are likely to be transformed during social learning. Some beliefs cause people to 
be more likely to be imitated, because the people who hold those beliefs are 
more likely to survive or more likely to achieve social prominence. Such beliefs 
will tend to spread while beliefs that lead to early death or social notoriety will 
disappear. We want to explain how these processes, repeated generation after 
generation, account for observed patterns of cultural variation. 

We find it hard to recollect the exact pathway that brought us to this way of 
thinking. For sure, we were influenced by Donald T. Campbell’s famous 1965 
essay, and by an early (1973) article of Luca Cavalli-Sforza and Marc Feldman. 
The general idea was somehow in the air in the early 1970s as F. T. Cloak, 
Eugene Ruyle, Richard Dawkins, Bill Durham, and Ron Pulliam and Christopher 
Dunford published work espousing a similar approach to culture. Somewhat 
later (in 1978), we were fortunate to sit in on a class taught by Cavalli-Sforza and 
Feldman that was very helpful, especially in adapting models from population 
genetics to model the population dynamics of cultural variants. We recall think- 
ing that applying the evolutionary biologists' concepts and methods to the study 
of culture was a rather obvious thing to do. We were more than pleasantly 
surprised that our predecessors had left so much relatively easy and interesting 
work undone. As Geoff Hodgson and Robert Richards have discovered, a prop- 
erly evolutionary social science formed in the late nineteenth and early twentieth 
centuries before dying an untimely death. 

As we were first thinking these thoughts, what came to be called the so- 
ciobiology controversy burst into full bloom. The mid-1960s saw the birth of the 
modern theory of the evolution of animal behavior. Bill Hamilton’s seminal 
articles on inclusive fitness and George Williams’s book Adaptation and Natural 
Selection were the foundations. The next decade saw an avalanche of important 
ideas on the evolution of sex ratio, animal conflicts, parental investment, and 
reciprocity, setting off a revolution in our understanding of animal societies, a 
revolution still going on today. By the mid-1970s a number of people, including 
Dick Alexander, Ed Wilson, Nap Chagnon, Bill Irons, and Don Symons, began 
applying these ideas to understand human behavior. Humans are evolved crea- 
tures, and quite plausibly our societies were shaped by the same evolutionary 
forces that shaped the societies of other animals. Moreover, the new theory of 
animal behavior — especially kin selection, parental investment, and optimal for- 
aging theory — seemed to fit the data on human societies fairly well. The reaction 
from much of the social sciences was, to put it mildly, pretty negative. 

The causes of this reaction are complex, as Ullica Segerstrale has shown. The 
association of biological ideas with racist, eugenicist ideas during the early part of 
the last century surely played an important role. Another big problem was that 
many social scientists mistakenly thought about these problems in terms of 
nature versus nurture. On this view, biology is about nature; culture is about 
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nurture. Some things, like whether you have sickle-cell anemia, are determined 
by genes — what we call nature. Other things, like whether you speak English or 
Chinese, are determined by the environment — nurture. Evolution shapes ge- 
netically determined behaviors, but not learned behaviors. Social scientists knew 
that culture played an overwhelmingly important role in shaping human be- 
havior, and since culture is learned, evolutionary theory has little to contribute to 
understanding human behavior. 

The problem was that this argument cut no ice with anybody who knew 
much about evolutionary biology. Although the nature-nurture way of thinking 
is common, biologists know that it is deeply mistaken. Traits do vary in how 
sensitive they are to environmental differences, and it is sensible to ask whether 
differences in traits are mainly due to genetic differences or differences in the 
environment. However, the answer you get to this question tells you nothing 
about whether the traits in question are adaptations shaped by natural selection. 
The reason is that every bit of the behavior (or physiology or morphology, for that 
matter] of every single organism living on the face of the earth results from the 
interaction of genetic information stored in the developing organism and the 
properties of its environment, and if we want to know why the organism develops 
one way in one environment and a different way in a different environment, we 
have to find out how natural selection has shaped the developmental process of 
the organism. This logic applies to any trait, learned or not. Moreover, biologists 
have been quite successful in applying adaptationist reasoning to explain learned 
behavior. 

Because it was framed in terms of nature versus nurture, the evolutionary 
social science community by and large rejected the idea that culture makes any 
fundamental difference in the way that evolutionary thinking should be applied 
to humans. The genes underlying the psychological machinery that gives rise to 
human behavior were shaped by natural selection, so, at least in ancestral en- 
vironments, the machinery must have led to fitness-enhancing behavior. If it goes 
wrong in modern environments, it is not culture that is the culprit, but the fact 
that our evolved, formerly adaptive psychology “misfires” these days. 

We think that both sides in this debate got it wrong. Culture completely 
changes the way that human evolution works, but not because culture is learned. 
Rather, the capital fact is that human-style social learning creates a novel evo- 
lutionary trade-off. Social learning allows human populations to accumulate re- 
servoirs of adaptive information over many generations, leading to the cumulative 
cultural evolution of highly adaptive behaviors and technology. Because this 
process is much faster than genetic evolution, it allows human populations to 
evolve (culturally) adaptations to local environments — kayaks in the arctic and 
blowguns in the Amazon — an ability that was a masterful adaptation to the cha- 
otic, rapidly changing world of the Pleistocene epoch. However, the same psy- 
chological mechanisms that create this benefit necessarily come with a built-in 
cost. To get the benefits of social learning, humans have to be credulous, for the 
most part accepting the ways that they observe in their society as sensible and 
proper, but such credulity opens human minds to the spread of maladaptive 
beliefs. The problem is one of information costs. The advantage of culture is that 
individuals don’t have to invent everything for themselves. We get wondrous 
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adaptations like kayaks and blowguns on the cheap. The trouble is that a greed for 
such easy adaptive traditions easily leads to perpetuating maladaptions that some- 
how arise. Even though the capacities that give rise to culture and shape its 
content must be (or at least have been] adaptive on average, the behavior observed 
in any particular society at any particular time may reflect evolved maladaptations. 
Empirical evidence for the predicted maladaptations is not hard to find. 

Much of our work has been directed at understanding the evolution of the 
psychological capacities that both permit and shape human culture (see part I). 
Most evolutionary thinkers approach this problem by first asking how evolution 
should have shaped the psychology of a group-living, foraging hominid. Then, 
having answered that question, they ask how the evolved psychology will shape 
human culture. The implicit evolutionary scenario seems to be that Pleisto- 
cene hominids were just extra-smart chimpanzees, clever social animals in whom 
social learning played a negligible role until the evolution of our brain was more 
or less complete. Then we took up culture, whose evolution is completely con- 
trolled by the preexisting evolved mind. First, we got human nature by genetic 
evolution; then, culture happened as an evolutionary by-product. 

This way of thinking neglects the feedback between the nature of human 
psychology and the kind of social information that this psychology should be 
designed to process. For us to take bitter medicine, our psychology has to have 
evolved both to learn socially and to let social learning override aversive stimuli 
from time to time. As we discuss in chapters 1 and 2, social learning can be 
adaptive because the behavior of other individuals is a rich source of information 
about which behaviors are adaptive and which are not. We all know that pla- 
giarism is often easier than the hard work of writing something oneself, and 
imitating the behavior of others can be adaptive for the same reason. The trick is 
that once social learning becomes important, the nature of the behavior that is 
available to imitate is itself strongly affected by the psychology of social learning. 
Suppose, for example, that everyone relied completely on imitation. Then, even 
if we somehow started with highly adapted traditions, behavior would gradually 
become dysfunctional as the environment changed and errors crept into the tra- 
ditions. To understand the evolution of the psychology that underlies social 
learning, one must take this sort of feedback into account. We want to know 
how evolving psychology shapes the social information available to individuals 
and how selection shapes psychology in an environment with direct information 
from personal experience and the potential to use the behavior of others at a 
lower cost but perhaps greater risk of error. The research reported in these 
chapters suggests that this kind of reasoning leads to conclusions quite differ- 
ent from those of other evolutionary theories of human behavior. Under the 
right conditions, selection can favor a psychology that causes most people to 
adopt behaviors "just” because the people around them are using those behav- 
iors. Weak psychological forces that derive from people occasionally tweaking 
their traditions in adaptive directions are sufficient to maintain the tradition in an 
adapted state so long as the environment is not changing too rapidly and the 
cultural analog of mutation is not too disruptive. 

If the only processes shaping culture arose from our innate evolved psy- 
chology, then culture would be a strictly proximate cause of human behavior. 
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However, not all of the processes shaping culture arise from our innate psy- 
chology. From the beginning of our work, we have emphasized that culture leads 
to the spread of maladaptive cultural variants (see Richerson and Boyd 1976, 
1978], Culture is not always, or even typically, transmitted from parents to 
offspring. Instead, cultural variants are acquired from all kinds of people. This is 
a good thing because sampling a wider range of models increases the chance of 
acquiring useful information. However, acquiring adaptive information from 
others also opens a portal into people’s brains through which maladaptive ideas 
can enter — ideas whose content makes them more likely to spread, but do not 
increase the genetic fitness of their bearers. Such ideas can spread because they 
are not transmitted as genes are. For example, in the modern world, beliefs 
that increase the chance of becoming an educated professional can spread even if 
they limit reproductive success because educated professionals have high status 
and thus may likely be emulated. Professionals who are childless can succeed 
culturally as long as they have an important influence on the beliefs and goals of 
their students, employees, or subordinates. The spread of such maladaptive ideas 
is a predictable by-product of cultural transmission. 

Selection acting on culture is an ultimate cause of human behavior just like 
natural selection acting on genes. In several of the chapters in part III we argue 
that much cultural variation exists at the group level. Different human groups 
have different norms and values, and the cultural transmission of these traits can 
cause such differences to persist for long periods. The norms and values that 
predominate in a group plausibly affect the probability that the group is suc- 
cessful, whether it survives, and whether it expands. For illustration, suppose 
that groups with norms that promote patriotism are more likely to survive. This 
selective process leads to the spread of patriotism. Of course, this process may be 
opposed by an evolved innate psychology that biases social learning, making us 
more prone to imitate, remember, and invent nepotistic beliefs than patriotic 
beliefs. The long-run evolutionary outcome would then depend on the balance 
of these two processes. Again, for illustration, let us suppose that the net effect 
of these opposing processes causes patriotic beliefs to predominate. Then, the 
population behaves patriotically because such behavior promotes group survival, 
in exactly the same way that the sickle-cell gene is common in malarial areas 
because it promotes individual survival. Human culture participates in ultimate 
causation. 

This way of thinking about cultural evolution leads to a picture of a pow- 
erful adaptive system necessarily accompanied by exotic side effects. Some of 
our evolutionist friends take a dim view of this notion, seeing it as giving aid and 
comfort to those who would deny the relevance of evolution to human affairs. 
We prefer to think that the population-based theories of cultural evolution 
strengthen Darwin’s grasp on the human species by giving us for the first time a 
tentative picture of the engine that powered the furious pace of change in the 
human species over the last few hundred thousand years. Compare us to our ape 
cousins. They still live in the same tropical forests in the same small social groups 
and eat the same fruits, nuts, and bits of meat as our common ancestors did. By 
the late Pleistocene epoch (say 20,000 years ago), human foragers already oc- 
cupied a much wider geographical and ecological range than any other vertebrate 
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species, using a remarkable range of subsistence systems and social arrangements. 
Over the last ten millennia we have exploded to become the earth’s dominant 
organism by dint of deploying ever more sophisticated technology and social 
systems. The human species is a spectacular evolutionary anomaly, and we ought 
to expect that the evolutionary system behind it is anomalous as well. Our quest 
is for the evolutionary motors that drove our divergence from our ancestors, and 
we believe that the best place to hunt is among the anomalies of cultural evo- 
lution. This does not mean that gene-based evolutionary reasoning is worthless. 
On the contrary, human sociobiologists and their successors have explained a lot 
about human behavior, even though most work ignores the novelties introduced 
by cultural adaptation. However, there is still much to explain, and we think that 
the Darwinian, population-based properties of culture are essential components 
of a satisfactory theory of human behavior. 
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PART l 

The Evolution of 
Social Learning 


The human species presents evolutionists with a vexing puzzle. 
Complex, cumulatively evolving culture is rare in nature. Simple traditions are 
widespread, and in a few species — whales, dolphins, primates, and birds — 
traditions are fairly complex. However, even the most complex traditions 
in other animals are manifestly simpler than those in human cultures. Our 
capacities to imitate and teach support exceedingly complex and variable 
technological, social, and symbolic systems like art and language, a capability 
that is qualitatively different from that possessed by any other species. 

If another species has a language with thousands of words, a toolkit with 
hundreds of intricate items, and societies composed of a few thousand 
unrelated individuals, we would know of it by now. This fact raises the 
obvious questions: Why now? And why only us? True, some fancy adaptations 
like the elephant’s trunk are unique, but really good tricks like the camera eye 
tend to have evolved repeatedly among the world’s millions of species. Given 
that fancy culture has made humans extraordinarily successful, why isn’t it 
much more common? And why didn’t it arise with the dawn of complex 
animals hundreds of millions of years ago? 

The chapters in this part address these questions. In chapter 1, we 
construct a very simple model of the evolution of social learning. We imagine 
a population in which individuals can learn for themselves but can also 
imitate someone of the previous generation (their mothers, for example). 
These organisms live in a spatially variable world, and their adaptive task is to 
combine their own experience and the vicarious experience acquired from 
their mother to guess how they should behave. There are two types of 
environments: wet and dry. In the dry environment, the best subsistence 
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strategy is, say, hunting and gathering. In the wet environment, farming is the 
best subsistence strategy. The information available to individuals is noisy. 
On average, individual learning gets the right answer, but sometimes it leads 
to errors. Even in the dry environment, a run of rainy years might lead one 
who depends on individual experience to believe that the environment is 
really wet and hence to mistakenly adopt farming instead of hunting and 
gathering. Individuals can evaluate the quality of their individual experience 
and use this rule: if individual experience is sufficiently accurate, rely on it; 
otherwise, imitate. Some individuals move about on the landscape and may 
find themselves in a wet environment, whereas their mother came from a dry 
one or vice versa. Thus, depending upon a mother’s traditional wisdom has the 
advantage of evading errors due to noisy individual learning. So long as a 
mother’s lineage has not recently switched environments, both natural 
selection and individual learning will have tended to make her ideas about 
the nature of the environment accurate on average. On the other hand, 
if migration has recently removed the mother’s lineage to the other 
environmental state, her received wisdom may well be wrong. 

Though very simple and stylized, this model captures one much-noted 
structural feature of the cultural system, namely, that it is a system for the 
inheritance of acquired variation (often called “Lamarckian inheritance”; 
ironically, this process was as much a part of Darwin’s ideas as Lamarck’s]. 
The results of the model are quite intuitive. If there is little migration between 
different environment types, the optimal thing to do is rely on individual 
experience only when it is highly accurate and, as a result, imitate most of the 
time. The effect of occasional individual learners is sufficient to keep most 
traditions adapted to the local environment. In the opposite limit, when 
individuals move so much that each generation is placed at random with 
respect to their mom’s environment, imitation information is useless, and the 
adaptive strategy is to depend only upon individual learning — personal expe- 
rience. In between these limits, some weighted average of personal and 
vicarious experience should determine an individual’s choice: more individual 
experience in the mix when migration is relatively frequent and individual 
learning not so error-prone, more tradition when migration is relatively in- 
frequent and individual learning relatively error-prone. Given that all envi- 
ronments are spatially heterogeneous and all animals, and plants for that 
matter, migrate, this model suggests that culture should be common, if not 
ubiquitous. It certainly does not solve the puzzle of human uniqueness; it 
makes that puzzle more difficult. 

These results do not depend too much on the details of the model — 
various models have very similar properties. The spatial model can easily 
be modified to reflect temporal variations with similar results (Boyd and 
Richerson, 1988]. Other interactions between individuals’ psychology and 
culture lead to similar effects. We have studied a variety of biased social 
learning effects in which individuals do not learn new variants for themselves 
but rather preferentially copy existing ones using a number of biasing rules 
(Boyd and Richerson, 1985]. The models can also be modified to take account 
of social learning within, as well as between, generations. The take-home 
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message is that a cultural system of inheritance is an evolutionarily flexible 
system that natural selection could tune to cope with many patterns of en- 
vironmental variability. These models support Darwin’s intuition that imita- 
tion and other forms of social learning should be common, but they give us no 
clue about why our species’ unusually hypertrophied cultural system evolved. 

However, one assumption is crucial. In 1989, Alan Rogers published 
a model with very different qualitative properties. Here, the population 
consisted of two innate types: learners and imitators. Learners learn individ- 
ually and imitators copy someone at random. Rogers showed that the evolu- 
tion of imitation in such a population behaves curiously. Social learning tends 
to be favored; under many conditions, a fair frequency of imitators exists at 
evolutionary equilibrium. However, when the system equilibrates, imitators 
and learners have exactly the same fitness, and since learners always have the 
same fitness, this mixture of imitators and learners has the same fitness as an 
all-learner population before imitators began evolving in it. Social learning 
evolves, but it is not adaptive because the population at equilibrium copes no 
better with a variable environment than a population that doesn’t imitate at 
all. In contrast, in the models introduced in chapter 1 , the mean fitness of 
the population is higher at equilibrium — imitation does increase the popula- 
tion’s ability to adapt. In chapter 2 we show that the key difference is the 
effect of imitation on individual learning. In Rogers’s model, the only benefit 
to imitation is that it allows individuals to avoid the costs of learning; 
imitators are scroungers who profit from the costly learning efforts of others. 

In the model presented in chapter 1 , the possibility of imitation increases 
the efficiency of learning by allowing learners to be selective. We show in 
chapter 2 that the ability to accumulate improvements in many small steps 
can have the same effect. 

Nevertheless, Rogers’s model does illustrate an important feature of the 
relationship between individual and social learning. In a cultural population, 
effortful individual attempts to learn or to bias imitation tend to improve the 
average quality of cultural traditions to the benefit of everyone. Selection at 
the individual level will tend to produce less individual learning and bias than 
would be optimal from the point of view of the population because of the 
altruistic effect of social learning on future members of the population. 

Kameda and Nakanishi (2002] have shown experimentally that some human 
subjects produce information while others free ride on the efforts of others. 
Intellectual property protections are a modern method of trying to adjust 
incentives to individuals to gain a more optimal level of creative work than in 
a society in which inventors are parasitized by imitators. Henrich and 
Gil- White (2001) argue that human prestige systems evolved to compensate 
those who seem to have the best ideas to imitate. If, as we argue in part III, 
humans are subject to cultural group selection, many institutions and even 
(via coevolution) innate predispositions may arise to increase the individual 
effort devoted to information updating beyond that favored by individual 
advantage alone. 

Chapter 3 tackles the uniqueness of human culture. The models of 
chapter 1 and 2 suggest that the capacities that give rise to culture can readily 
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evolve. Given that culture has made humans so successful, shouldn’t many 
more animals have evolved this world-beating adaptation? In one sense, many 
do. Simple systems of social transmission are quite common (Heyes and Galef, 
1996]. What seems unique about human social learning is our ability to 
accumulate adaptive information over many generations, building complex 
artifacts and institutions composed of many small innovations. Even some- 
thing as simple as a good stone-tipped spear reflects cumulative innovations 
applied to the shaft, the hafting, and the stone point. No nonhuman tradition 
yet described approaches such a spear in complexity. Humans can maintain 
complex traditions because we are more accurate imitators than any other 
animal yet tested. Accurate imitation plausibly depends upon costly cognitive 
structures such as a theory of mind. As we show in chapter 3, the evolution 
of such structures faces a major hurdle. Complex cultural traditions are 
the product of a population of minds. Many people and the passage of time are 
necessary for a complex tradition to evolve. In the absence of such a 
population, the costly structures necessary for accurate imitation are useless. 
The rare individual who happens to have the costly structure, perhaps only in 
rudimentary form, will be born into a world with no complex traditions to 
learn and hence no use for the capacity to imitate accurately. 

If correct, this model suggests that the capacities that permit accurate imita- 
tion must have been favored initially for some other purpose. For example, a 
theory of mind may have been favored because it allows better manipulation 
of the social world. Then, this capacity gave rise to more accurate imitation, 
and the cultural evolution of complex adaptive traditions as a side effect. This 
argument provides one explanation for the rarity of cumulative cultural tra- 
dition: humans were the first species to chance on some devious path around 
this constraint, and then we have preempted most of the niches requiring 
culture, inhibiting the evolution of any competitors. 

Chapter 4 provides a different explanation for the rarity of culture, one 
based on recent discoveries about the nature of Pleistocene climates. 
Evolutionists divide explanations of the large-scale, long-term patterns of 
evolution into those internal to the evolutionary process itself and those 
external to it, such as changes in climate. The argument in chapter 3 is a 
typical internal explanation. Evolution always favored a capacity for complex 
culture, but it took life a long time to find its way around constraints and 
evolve complex, cumulative traditions. Such internal explanations are implicit 
in many accounts of our origins. Such accounts flatter our species because 
they assume that an intelligent culture-bearing species is superior to the 
common run of animals. Considering the possibility of external causes is 
a useful antidote to the implicit acceptance of internal explanations, especially 
as they may be the product of anthropocentrism. 

The correlation of brain size with climate variation favors an external 
explanation for the timing of the evolution of culture in humans and other 
animals. Terrestrial vertebrates have been around for some 350 million years. 
Dinosaurs and their allies were not simple animals, but they did have small 
brains. The mammals that coexisted with dinosaurs also had small brains. 
Brain tissue is quite expensive. All else equal, selection will favor the stupidest 
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possible creatures. Perhaps dinosaurs and ancient mammals lived in a world 
that did not require much brainpower. For the last 65 million years, the 
average size of mammalian brains has gradually increased. The rate of increase 
has jumped during the last couple of million years. Brain size increase in 
mammals has an interesting parallel in the cooling and drying of climates over 
the last 65 million years, culminating in the sharp average cooling and drying 
and the onset of cycles of glacial advance and retreat that became more 
pronounced about 2.5 million years ago. If in addition to cooling and drying, 
this world has become more variable, we'd have an explanation for why brain 
size has increased in so many lineages. 

The long-known advances and retreats of glaciers take tens of thousands of 
years and thus are far too slow to require much brainpower to cope with. 
However, ice core data published in the early 1990s began to paint a picture 
of hugely variable glacial environments, much more variable than we have 
experienced on the long march to our present civilizations during the last 
1 1,500 years. Much of this variation is on time scales ranging from a 
millennium to the limits of resolution of the data (a few years; chapter 1 7 
includes more recent references, see also Helmke, Schultz, and Bauch [2002]]. 
These are just the time scales of variation that the models suggest should 
favor a cultural system that can mix and match the conservatism of faithful 
transmission with flexibility of individual learning to generate rapidly evolving 
traditions adapted to rapidly changing environments. Variability on short time 
scales probably also favors individual behavioral flexibility. If this argument is 
correct, we can interpret brain size as a rough bioindicator of the amount 
of fine-scale environmental variability in space and time. Ancient mammals 
were dull because they lived in a dull, little-varying world, whereas modern 
mammals are sharp because they live in a world alive with rapid change. 

The field of paleoclimatology is currently advancing rapidly, and consequently 
our ability to formulate and test such conjectures is increasing. 

Chapter 5 introduces two forms of biased transmission, conformity and 
success-based, that can produce both adaptive and maladaptive evolution- 
ary outcomes. These biases can be thought of as adaptive rules of thumb for 
acquiring adaptive information. If information is costly to acquire, evolution 
will favor fast, frugal heuristics for solving adaptive problems. [The Dahlem 
Conference book from which this chapter is drawn covers this general topic in 
considerable detail.] Imitating mom in the face of the costs of learning for 
one’s self in the style of the model of chapter 1 is a trick to finesse information 
costs. Conforming to the majority is an inexpensive rule to apply, compared, 
say, to doing experiments on the alternative behaviors one might adopt. 

Many adaptive forces will tend to make adaptive behaviors common, 
so adopting the commonest is generally not a bad guess. Similarly, if other 
people's adaptive success is in any way public knowledge, imitating the 
successful is a good rule to follow. 

These quick- and-dirty rules of thumb have interesting evolutionary side 
effects. In part III we discuss how conformity reduces within-group cultural 
variation, making group-level selection a more plausible process than group 
selection on genes is usually thought to be. Imitating the successful can also 
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lead to a form of rapid group selection. In part II, we will see how this process 
leads to symbolically marked group boundaries. The other interesting evolu- 
tionary feature of these rules is that under some conditions they can give rise to 
maladaptive behavior. Consider a moral norm that is maintained by a combi- 
nation of conformity and success-based bias. Some such norms, for example, 
the mutilation of genitalia and high rates of female infanticide, are probably 
quite maladaptive. Yet if people conform and if those who violate the norms 
are punished in some way, those who attempt to abandon such practices in 
favor of more adaptive ones will become a stigmatized minority. In this way, 
normally adaptive learning mechanisms can perpetuate dysfunctional behavior 
under the right circumstances. Perhaps one reason why complex human- 
style culture is so rare is that these complexities impose a burden that is worth 
meeting only when the adaptive advantages of culture outweigh this cost. 

We do not tout this family of models and our interpretations of them as 
any more than a first attempt at explaining why social learning evolves, 
especially how our own extraordinary system of complex culture has evolved. 
We do hope to have demonstrated how we can think in a more rigorous 
way about the Big Questions of human life using simple models of cultural 
evolution as a tool. Cultural evolution is rooted in the psychology of in- 
dividuals, but it also creates population-level consequences. Keeping these two 
balls in the air is a job for mathematics; unaided reasoning is completely 
untrustworthy in such domains. 
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1 Social Learning as 

an Adaptation 


Learning is widespread in the animal kingdom. While the me- 
chanisms of learning range from relatively simple conditioning in invertebrates to 
elaborate cognitive mechanisms in mammals, most animals use some form of 
learning to acquire behavior that is adaptive in the local habitat. Despite this 
fact, the great bulk of evolutionary theory assumes that organisms adapt to 
variable environments through genetic mechanisms alone. The neglect of learn- 
ing may result from the difficulty of understanding the evolution of learned 
behaviors. Learning entails an evolutionary trade-off. The advantages of learning 
are obvious; it allows the same individual to behave appropriately in different 
environments. For example, by sampling novel foods and learning to avoid 
noxious food types, a cosmopolitan species like the Norway rat can acquire an 
appropriate diet in a wide range of environments. However, learning also has 
disadvantages. First, the learning process itself may be costly. By sampling novel 
foods, the rat may accidentally poison itself, a risk that could be avoided by 
an animal with rigid, genetically specified food preferences. Second, because 
learned behavior is based on imperfect information about the environment, it 
can lead to errors. For example, the rat may fail to sample or mistakenly reject a 
nutritious food item. To understand variation in learned behavior among species, 
one must understand how this evolutionary trade-off is resolved. 

Recently, several authors have used statistical decision theory to show why 
the learning rules of different species vary (McNamara and Houston, 1980; 
Staddon, 1983; Stephens and Krebs, 1988). One can think of individual organ- 
isms as having to “choose” among alternative behaviors to maximize their fitness 
in the local environment. They have some genetically inherited “prior” infor- 
mation about the state of local environment, some data from their experience, 
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and usually the opportunity to gather more data at some cost in terms of fitness. 
Decision theory is useful because it tells us the best way to make decisions with 
imperfect information. Assuming that natural selection has shaped the learning 
rules of different species so that they are adaptive, decision theory should help 
us to understand why different animals learn differently. In the same way that 
mechanics helps us understand the comparative morphology of skeletons, deci- 
sion theory may help us understand comparative behavior of animals. 

We are interested in understanding the adaptive function of one particular 
form of learning, social learning. By social learning, we mean the acquisition of 
behavior by observation or teaching from other conspecifics. Social learning has 
been implicated in the acquisition of behavior in a variety of taxa. Many songbirds 
acquire their song by copying the song of other adult birds (Marler and Tamura, 
1964}. Rats seem to acquire food preferences both from taste cues in their 
mothers' milk, and from the smell of other rats’ pelage (Galef, 1976}. There is 
circumstantial evidence that individuals of several different primate species may 
acquire complex new behaviors by social learning (Kawai, 1965; McGrew and 
Tutin, 1978; Hauser, 1988}. Finally, social learning plays an essential role in 
human adaptation (Boyd and Richerson, 1985}. For reviews of the literature on 
social learning in nonhuman animals, see Galef (1976, 1988}. 

In this essay we present several simple mathematical models, of social 
learning. Our aim is to use these models to explain social learning as an adap- 
tation in the same way that decision theoretic models have been used to explain 
other forms of learning. The decision theoretic models alone are not sufficient to 
understand the conditions under which social learning is adaptive. Instead, de- 
cision theoretic models must be generalized to allow for the fact that behaviors 
acquired by social learning are transmitted from individual to individual. Thus, 
to understand social learning, we need models that keep track of the processes 
that change the frequency of alternative behaviors in a population through time. 
Consider a young rat learning food preferences. To predict whether it acquires a 
preference for some food, say cilantro, by social learning, we need to know 
whether its mother’s diet includes cilantro. Its mother’s diet will depend on both 
her experience and her own mother’s diet. More generally, to understand why a 
preference for cilantro among a population of rats is becoming more common (or 
more rare}, we must know its frequency among rats of previous generations, and 
how this generation’s individual learning experiences changed the frequency of 
the preference between the time that they acquired their initial food preferences 
by social learning and the time that they serve as models for members of the next 
generation. Because behavioral variants are transmitted from individual to in- 
dividual, and thus from generation to generation, understanding social learning 
requires understanding the dynamic processes that act to change the frequency 
of different socially learned behaviors in a population of organisms through time. 
We must link models of individual learning to models of social learning to de- 
termine the evolutionary dynamics of behavioral variants in a population. 

We will use these models to address two questions about the adaptive 
function of social learning: 

1 . Under what circumstances should natural selection favor increased reliance 
upon social learning at the expense of individual learning? We will begin by analyzing 
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a model in which a population of organisms acquires behavior by a combination 
of individual and social learning in a uniform and constant environment. This 
model indicates that, on average, in constant environments reliance on social 
learning always leads to higher fitness than reliance on individual learning. We 
will then add environmental variability to the model. Under these conditions, 
there is an optimal mix of social and individual learning. The relative importance 
of social learning in the optimal mix is increased when environments are pre- 
dictable and when individual learning is error-prone. 

2. Given that naive individuals experience the behavior of a number of expe- 
rienced individuals, and that this behavior varies, how should social learning be 
structured? Here we will consider a model in which naive individuals are exposed 
to a finite sample of the behavior of members of the previous generation. We 
will refer to this set of observed and potentially imitated individuals as “mod- 
els.” Naive individuals will be exposed to different combinations of behavior 
that they can imitate. The analysis suggests that in a variable environment, se- 
lection favors individuals who are predisposed to acquire the most common 
behavior among their models. It also suggests that selection favors individuals 
whose propensity to rely on individual learning increases as the variability among 
their set of models increases. 


A Model of Individual and Social Learning 

We begin by addressing this question: when does social learning allow a more 
effective tracking of the environment than individual learning? To answer this 
question, we want to construct a model that embodies the following assump- 
tions about the interaction of social and individual learning: 

1 . A population of organisms is potentially confronted with a variable 
environment in which different behaviors are favored by selection in 
different habitats. 

2. Individuals in the population can acquire their behavior by some 
mixture of social learning and individual learning, where: 

3. Social learning involves the faithful copying of the behavior of a 
single other individual in the population, and: 

4. Individual learning occasionally leads to errors. 

5. All individuals pay any fitness costs associated with individual 
learning whether they ultimately acquire a behavior by social learn- 
ing or by individual learning. 

Given these assumptions, we want to determine the conditions under which 
selection will favor individuals who rely significantly on social rather than indi- 
vidual learning. Consider a population that occupies an environment that can be 
in one of two distinct states: habitat 1 or habitat 2. Each individual in the pop- 
ulation will acquire one of two alternative behaviors, also labeled 1 and 2. As 
shown in Table 1.1, each individual has a “baseline” fitness W; individuals who 
acquire the behavior that is best in their environment achieve an increase in fit- 
ness, D. Thus, individuals that acquire behavior 1 have higher fitness in habitat 1 
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Table in Fitness associated with two behaviors 


Behavior 1 Behavior 2 


Habitat 1 W+D W 

Habitat 2 W W+D 


than individuals that acquire behavior 2. Similarly, behavior 2 yields higher fitness 
in habitat 2 than does behavior 1 . Once an individual has acquired one of the two 
behaviors, it does not change. Nor does the environment change, so that an in- 
dividual experiences only one of the two environmental states during its lifetime. 

The adaptive problem that faces each individual is to determine which of 
the two habitats it is in. Individuals in the model have two sources of infor- 
mation available to help them solve this problem. 

Each individual obtains evidence from its own experience: any observations, 
learning trials, or other nonsocial information that can help determine the state 
of the environment. We assume the result of each individual’s experience can be 
quantified in terms of a single normally distributed random variable, x. If the en- 
vironment is in state 1, the mean value of x is M; if it is in state 2, the mean value 
of x is —M. In other words, the true state of the environment is either M or — M. 
Individuals acquire an imperfect estimate of the state of the environment, x, 
from personal experience. The standard deviation of the distribution of x, S, is an 
inverse measure of the quality of the evidence available to the members of the 
population. The larger S is, the poorer the individual’s estimate of the state of 
the environment. If S -C IMI, then most individuals’ experiences will clearly in- 
dicate the state of the environment. If S^> \M\, the results of gathering direct 
evidence will not be very informative. 

Assume that the population is structured into nonoverlapping cohorts. In- 
dividuals in one cohort can observe the behavior of individuals from the previous 
cohort who have already acquired either behavior 1 or behavior 2. Individuals in 
one cohort act as models for individuals in the next cohort. 

We imagine that individuals in the population use these sources of infor- 
mation to decide between the two alternatives in the following way: if the 
outcome of direct observation, x, is greater than a threshold value d (d> 0], the 
individual acquires behavior 1; if x is less than —d, then it acquires behavior 2. 
This is our attempt to capture the essence of the processes of individual learning. 
Finally, if — d < x < d, then the individual imitates the behavior of a single 
individual chosen at random from the population, its model. This, in turn, is our 
attempt to capture the essence of social learning. The order in which the two 
kinds of learning occur is not crucial; the model applies equally well to a situ- 
ation in which individuals begin by imitating others and then adopt a new be- 
havior only if confronted with decisive personal experience. 

The parameter d serves two functions. First, as shown in figure 1.1, it is 
analogous to a confidence interval. The larger the value of d that characterizes 
the population, the more decisive the evidence must be before it will affect the 
individual’s decision. Second, the value of d simultaneously determines the 
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Figure 1 . 1 . Illustrates the definition of pi andp 2 and their relationship to the parameter d. 
F(x) is the cumulative normal distribution, and /(%) is the normal density function. 


relative importance of social learning and individual learning. We assume that 
when individuals are in doubt on the basis of their own experience, they utilize 
behaviors acquired by imitation. Let pi be the probability that x > d, and let p 2 
be the probability that x < —d. If d is large, then individuals attend to their own 
experience only if it provides compelling evidence about the state of the envi- 
ronment (i.e., pi, p 2 ~ 0). For the most part, they imitate another individual. 
If d is small, behavior is mainly determined by an individual’s experience, and 
social learning has little importance (i.e., p\ +p 2 ~ !]• 


Effects of Learning on the Distribution of 
Behavior in the Population 

To predict the likelihood that an individual will acquire a particular behavior by 
social learning, we must know what behavior characterizes the individual's 
model. Suppose that a fraction q t of individuals in cohort t acquired behavior 1 . 
A fraction pi of the naive individuals in cohort t will acquire behavior 1 based on 
their own experience, and a fraction q t [ 1 —pi —pz) acquire alternative 1 by im- 
itation. Thus, in cohort t the frequency of individuals acquiring behavior 1, q{ , is 

ql = q t (l-pi~p 2 ) + Pi (1) 

Now suppose that these individuals then serve as models for individuals in the 
cohort t+ 1. Then the frequency of behavior 1 among the models for cohort 
t+ 1, q t+ 1 , is approximately 

q t +i=qt (2) 

We say “approximately” because we have ignored the effect of natural selec- 
tion. In environment 1, differential mortality will increase the frequency of 
behavior 1. Here we are assuming that the effect of learning on the relative 
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frequencies of the two behaviors is so much greater than the effect of selection 
that selection can be safely ignored. 

Suppose that this process is repeated many times. That is, members of a 
cohort acquire their behavior by a combination of social and individual learning 
and then serve as models for the next cohort, and this process is repeated for 
many successive cohorts. Eventually the fraction of each cohort acquiring be- 
havior 1 will stabilize at the equilibrium value 


k = 


1 + P2/P1 


( 3 ) 


Thus, the fraction of individuals acquiring behavior 1 at equilibrium depends 
only on the ratio of the probability that an individual will choose alternative 2 
based on its own experience {P 2 ) to the probability that it will choose alternative 
1 based on its own experience (pi). If pz/p\ > 1, then the equilibrium frequency 
of individuals choosing alternative I is less than half; if p 2 /p\ < 1 , q > =>. The 
fraction choosing alternative 1 at equilibrium does not depend (directly) on the 
relative importance of social learning versus individual learning in determining 
the behavior of individuals (i.e., on the magnitude of I — p\ — P 2 ) . However, from 
equation I we know that the rate at which the population converges to the 
equilibrium value depends crucially on the amount of social learning. If there is 
little individual learning, pi and p 2 will be very small, and social learning will 
ensure that the population remains very similar from one generation to the next. 
Thus, as individual learning becomes less important in determining individual 
behavior, the population will converge more slowly to equilibrium. This property 
is crucial to our understanding of the evolution of mixed systems of social and 
individual learning in variable environments, as we will see. 


The Evolution of Social Learning 

We now consider the evolution of social learning. The relative importance of 
individual learning and social learning in determining phenotype is given by the 
parameter d. If d is affected by heritable genetic variation, then it will evolve 
under the influence of natural selection. We will model the evolution of d using 
the evolutionarily stable strategies (ESS) approach. That is, we assume that an 
individual's learning rule is affected by a genetic locus at which two alleles, a com- 
mon allele, H, and a very rare allele, h, are segregating. Most individuals in a 
population are characterized by the genotype HH, which results in them having a 
learning rule characterized by the parameter value d; however, there are a few 
rare mutant Hh individuals whose learning rule is characterized by a slightly dif- 
ferent parameter value, d+ 5. We assume that the hh genotype is so rare that it 
can be neglected. We then determine the conditions under which the rare allele 
can invade. The ESS value of d is that value which prevents any rare alleles from 
invading. When the ESS value of d is very large, we will say that social learning is 
adaptive, since when d is large, most individuals will depend on social learning. 

As a first step in understanding the evolution of social learning, we calculate 
the ESS value of d, assuming that the environment is entirely in state 1 . In this 
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case, the expected fitness of an individual whose learning rule is characterized by 
the parameter d! in a population in which most individuals have a learning rule 
characterized by parameter d (where d! may or may not equal d) is given by: 

EKd')} = W + D[q{d)[ 1 - P x {d') - p 2 [d 0] + Px [d')} (4) 

where q[d) is the frequency of behavior 1 at the equilibrium value given in 
equation (3), assuming that most individuals in the population are characterized 
by learning parameter d. The rare allele, h, can invade the population if Hh 
individuals whose learning rule is characterized by learning parameter d+ 5 have 
a higher expected fitness than HH individuals whose learning rule is character- 
ized by learning parameter d, that is, if E[w(d + <>)} > E{w(d)\. Since d is small, 
E{it>(d+ <5]} E{w(d)\ + d[DE{w(d)}/0d) , this condition can be rewritten in the 

following form: 

t5) 

Suppose that the invading allele increases d, so that S > 0. It follows from the 
definitions of d, px and p 2 that a given change in d causes a larger absolute 
decrease in p\ than in p 2 , or dp \/dd < dp 2 /dd < 0. Thus, inequality (5] says that 
the rare allele can invade whenever the percent decrease in the probability of 
acquiring the wrong behavior by individual learning exceeds the percent de- 
crease in the probability of getting the right behavior by individual learning. It can 
be shown that this expression is satisfied for all values of d. This means that the 
ESS value of d is as large as possible. 

We draw two lessons from this simple result. First, some social learning is 
always better than relying completely on the results of experience. (That is, the 
expected fitness of an individual using a learning rule characterized by d = 0 is 
always less than the expected fitness of individuals using a learning rule char- 
acterized by any positive value of d.] Second, in a population characterized by 
the ESS value of d, individuals may virtually ignore the evidence presented 
by direct experience and depend entirely on social learning, even when the only 
cost associated with learning is the occasional error. 

It is important to notice that this result was derived assuming that every 
individual in every cohort experienced habitat 1 . This assumption of an invariant 
environment is crucial because, as we have seen, the equilibrium frequency of 
the superior variant does not depend on the amount of individual relative to 
social learning, but the rate of approach to that frequency does. In a variable 
environment, the expected fitness of individuals in the population likely will 
depend on the rate at which the population can respond to changes as well as the 
eventual equilibrium. 


Social Learning in Variable Environments 

To introduce environmental variation into the model, suppose that half of each 
cohort experiences environment 1 and the other half of each cohort experiences 
state 2. (The assumption that the habitats are the same size greatly simplifies the 
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mathematical argument without altering the essential aspects of the problem.} 
Let pjk be the probability an individual’s choice is based on direct experience and 
that it results in behavior k given that the state of the environment is j. Because 
of the symmetry of the model, the following is true: 

Pu =P22 

P\2=P2\ (6} 

Variable environments are interesting in an evolutionary context only if 
events in one environment affect the other. Migration, a flow of behavioral 
variants from one environment into the other, will likely influence evolution in 
spatially variable environments. To model this effect, we suppose that there is a 
probability 1 — m that each model to whom a given individual is exposed ex- 
perienced the same environment that the given individual will experience, and 
therefore a probability m that the model experienced the other environmental 
state. Thus, m measures the effective rate of migration of individuals from one 
habitat to the other. We assume throughout that 0 <m<\. Let q t j be the 
fraction of individuals that acquire behavior 1 within the subpopulation of in- 
dividuals that experience environmental state ; in cohort t. Then the frequency 
of behavior 1 in environment j after learning but before migration will be: 

1 - Pi i - Pj2) + Pfl (7} 

and the frequency of models exhibiting alternative f in habitat j during cohort 
t+lis 


€r+i,i = (1 -rri)q' tl + mq’ t2 

qt+ 1,2 = (1 - m)q t 2 + mq' t l (8} 

Once again let us suppose that this process is repeated until a stable equilibrium 
is reached. Due to the assumed symmetry of the model, we know that any 
equilibrium at which both behaviors are present must satisfy 

q\ = \-q2 (9} 

where q\ is the fraction of individuals acquiring behavior 1 in environment 1, and 
qz is the fraction of individuals acquiring behavior 1 in environment 2. Using this 
fact one can show that 


. _ [l-2m)pu + m 

qi (1 - 2m)[pn + P12) +2 m 1 J 

Notice that when m = 0, equation [10] reduces to the equilibrium derived in the 
model without any environmental variation. Also notice that if individuals are 
equally likely to imitate models drawn from both environments [i.e., m = 2 ), 
then q\ = For intermediate values of m, q\ falls between these two extreme 
values. 

These properties make sense. In a uniform environment the behavior that 
results in higher fitness will increase in frequency according to the simplified 
model of the previous section; individuals should depend entirely on social 
learning and not take a chance on trial and error learning. When m = 0, there is 



SOCIAL LEARNING AS AN ADAPTATION 27 

no contact between individuals who experience the different environments, and 
the correct behavior in each environment becomes overwhelmingly common. 
Individual learning cannot do better than a perfected tradition, and it will fre- 
quently lead to errors. Within-cohort environmental variation, represented now 
by the movement of individuals among groups exposed to different environ- 
ments, causes individuals to be exposed to some immigrant models who are 
likely to have acquired the behavior favored by individual learning in the other 
environment. Therefore, the movement of models among groups in a spatially 
variable environment causes social learning to be a less reliable method of ac- 
quiring one’s behavior than it is in a homogeneous environment. When m=j 
the frequency of the superior behavior is increased in each environment by the 
effects of individuals’ experience, but the mixing of models from the two en- 
vironments exactly erases the gains, and the individuals in the next cohort must 
start from scratch. In this case, social learning is useless. 

The most interesting cases are the ones at intermediate values of m, where 
both social and individual learning are likely important. We will now com- 
pute the ESS amount of social learning in a variable environment for 0 < m < 
The expected fitness of individuals using a learning rule characterized by the 
learning parameter d! is given by 

1 -pn(d') -pi 2 (d')) + pn(d')\ (11) 

where qi(d) is the equilibrium frequency of trait 1 in habitat 1, assuming that 
virtually all of the population is characterized by learning parameter d. To de- 
termine the ESS value of d, the confidence-interval-like parameter that de- 
termines the relative importance of social and individual learning, we once again 
determine which value of d can resist invasion by modifying alleles. A population 
in which d predominates can resist invaders that increase d whenever: 

Consider how varying d affects the sign of the left-hand side of expression (12). 
We know from the models of a constant environment that the first term on the 
left-hand side of (12) is always positive (see equation 5). It is clear from the 
definition of p\\ and pi 2 (see fig. 1.1) that the second term equals zero when 
d = 0, and is negative for all larger values of d. This means that when d = 0, the 
left-hand side of (12) will be positive and alleles that increase d can invade. Next 
notice that as d becomes large, both pi 1 andpi 2 approach zero, and therefore for 
large enough values of d, the left-hand side of (12) is negative, and alleles that 
decrease d can invade. Taken together, these facts mean that expected fitness is 
maximized for some amount of social learning intermediate between zero and 
one as long as \ > m > 0. While we have not been able to solve (12) analytically, 
it is easy to solve numerically. The results, shown in figures 1.2 and 1.3, suggest 
that under a wide combination of migration rates and quality of individual ex- 
perience, it is optimal to employ a mixture of social and individual learning. 
There is a broad region with combinations of modest migration rates and 
moderate to low information quality where social learning should be rather more 
important than individual learning in determining individual behavior. In figure 



28 THE EVOLUTION OFSOCIA 


EARNING 



Figure 1 . 2 . Plots the evolutionary equilibrium value of d, d*, as a function of the quality 
of information available for individual learning, S, and for three levels of environmental 
heterogeneity, measured by m. 


1.2, the ESS value of d, d* , is plotted as a function of S, the measure of the 
quality of the information available to individuals, and the probability that naive 
individuals are exposed to models who learned from the wrong environment 
(m). There are two things to notice about these results: first, as individual ex- 
perience becomes less reliable (i.e., S becomes large), the optimal amount of 
social learning is increased. Second, as the environment becomes less predictable 
(i.e., m increases), the optimal amount of social learning decreases. In figure 1.3, 
we plot the probability that individuals rely on social learning [L* = 1 — pn[d*) — 
Pi 2 {d*J), given that d equals its optimal value. 

This model suggests that the adaptiveness of social learning relative to in- 
dividual learning depends on two factors: the accuracy of individual learning and 
the chance that an individual’s social models experienced the same environment 
that the individual experiences. A substantial dependence upon social learning 
seems to be most adaptive when individual learning is inaccurate and there is not 
too much migration among habitats. The occasional use of individually acquired 
compelling evidence, coupled with faithful copying in the absence of such evi- 
dence, is sufficient to keep the locally adaptive behavior common. 




soc 


EAR 


G AS 


DAPTATIO 



Figure 1.3. Plots the fraction of the population acquiring behavior by social learning 
when d is at its equilibrium value, L* = (1 — p](d*) — p2(d*J), as a function of S and m. 


Increasing the importance of individual learning would entail more errors and 
would reduce the frequency of the adaptive behavior. In contrast, when there is 
extensive migration among habitats, relatively rare instances of individual 
learning would not be sufficient to maintain a high frequency of the locally 
adaptive behavior. Under such conditions, individuals must rely on individual 
learning if they are to have any chance of acquiring locally adaptive behavior. 

Similar results derived using different models suggest that these conclusions 
are robust. We have analyzed the same dichotomous model in a temporally 
fluctuating environment (Boyd and Richerson, 1988). Assuming a Markov model 
of environmental change, we showed that the ESS reliance on social learning has 
the same qualitative properties as the model analyzed here. Elsewhere (Boyd and 
Richerson, 1985, ch. 4) we have analyzed a model that embodies the same 
qualitative assumptions about the nature of social learning and individual learning 
but in which behaviors are formalized as quantitative characters. These models 
have the same qualitative conditions for the evolution of social learning that result 
from this model. Finally, we have also extended the analysis of these models to 
allow for the genetic transmission of behavioral predispositions in addition to the 
genes that affect learning (Boyd and Richerson, 1983, 1985, ch. 4). 


Social Learning with More than One Model 

One can think of social learning as using the behavior of others as a source of in- 
formation about the environment. Adaptive processes such as individual learning 
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will often cause the more common behavior to also be the most adaptive be- 
havior, and, therefore, copying the behavior of a randomly chosen individual can 
be adaptive under the right circumstances. In many species, however, naive 
individuals may be able to observe the behavior of a number of experienced 
conspecifics. That is, each naive individual often has a set of models. When this is 
the case, one can think of such sets of models as samples of the behavior in the 
population. Then if there is behavioral variation in a population, different in- 
dividuals will be exposed to different samples of that behavior. Since different 
samples of behavior lead to different inferences about the commonness of one or 
the other behaviors in the population, it seems plausible that naive individuals 
exposed to different samples of behavior might differ in the extent to which they 
rely on social learning versus individual learning. 

To address this question, we have modified the model so that individuals are 
exposed to the behavior of n models. An individual’s models may differ in their 
behavior, and the naive individual can confront the problem of deciding which 
variant to adopt. There is also an opportunity afforded by a large set of models. 
Since individual learning will tend to increase the frequency of adaptive beha- 
viors in a local habitat, there may well be information in the model “sample” as 
to what behaviors are adaptive, especially as the size of the sample of the pre- 
vious generation increases. Selection might structure social learning to use this 
information. We want to determine the evolutionarily stable solutions to this 
problem. 

Begin by considering an individual exposed to i models using behavior I and 
n — i models using behavior 2. Once again assume that the individual observes 
the variable x that indicates the state of the environment and then adopts each 
behavior with the probabilities given in table 1,2. As before, the value of d* 
determines the minimum quality of information necessary before the individual 
will rely on individual learning. It is indexed by i to indicate that individuals may 
have different thresholds depending on the number of models who use one be- 
havior or the other. We further assume that d, = d n This assumption formalizes 
the idea that it is the number of models who use a given behavior that governs the 
usefulness of information acquired by social learning, not which trait they use. 
The value of A { determines the conditional probability that the individual will 
acquire behavior 1 given that it is going to rely on social learning. To represent the 
idea that there is no innate predisposition to adopt either trait in the absence of 
information about the environment, we assume that A { = 1 — A n 

As before, suppose that there are two habitats linked by migration, one in 
which behavior 1 is favored and one in which behavior 2 is favored. Let the 


Table 1.2. Probability of acquiring behavior 


Event 

Behavior 1 

Behavior 2 

di<x 

1 

0 

-di <x<di 

At 

(1-Ad 

x < — di 

0 

1 
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frequency of the behavior 1 among models in environment j be q t j. Further 
suppose that models are sampled at random from the population. With these 
assumptions, the frequency of behavior 1 in environment j after individual and 
social learning, q t p is 

x {A,(l - Pi M) - p j2 [_d 0) + Pi M)} C13) 

The frequencies in each habitat after migration are given by equation (8). 

The next step is to determine the equilibrium frequency of behavior 1 in 
each habitat. Because equation (13) is quite complex, we have not been able to 
derive an analytical expression for these equilibrium frequencies. However, 
it follows from the symmetry of the model that there is a stable symmetric 
equilibrium such that the favored behavior is common in each habitat, that is, 
qi = l — q 2 > We will refer to this as the symmetric equilibrium. Depending 
on the values of A- t and dp there may also be other stable internal equilibria at 
which one behavior is common in both habitats. 

To determine an evolutionarily stable pattern of social learning, assume that 
most of the population has a learning rule characterized by the sets of parameters 
d= [do, ■ ■ ■ ,d n \ and A={A 0 , . . . ,A n }, and that the population has reached the 
resulting symmetric equilibrium. Then an individual with a different learn- 
ing rule characterized by the sets of parameters d! = {do , . . . ,d n '} and A! = 
{A (/, . . . ,A„'}, has expected fitness given by 

E{w(d',A']} = W + D]T ( "UUl - qi )" _i 

x (4(1 - p,,K) - pizK)) + p,iK)1 (14) 

where qi is the frequency of the favored behavior in each habitat at the sym- 
metric equilibrium resulting from A and d. Then using the fact that A; = 1 — A n { 
and di = d„ ;, we can show that alleles that lead to a small increase in A; can 
invade if 


4',(i-4ir--4rD — qi)'> o as) 

which is always satisfied for i > n/2. Thus, the ESS values of A h A { , are given by 
M i>n/2 

A‘= 2 i = n/2 

{ 0 <<»/ 2 (16) 

Given that an individual is going to rely on social learning, he should always 
adopt the more common behavior exhibited by his models. At the symmetric 
equilibrium the favored behavior is more common in each habitat. Thus, if 
individual experience is not determinative, the best thing to do is copy the 
behavior that is most common among models as it is more likely to be the locally 
favored behavior. 
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To determine the ESS value of d,, d\, assume that the set of A { are at their 
ESS values given by (16). Then alleles which lead to a small increase in d t can 
invade if 


- qxT' ~ |^rCl - qiY >0 (17) 

Substituting the definitions of p\ \ and p\ 2 and simplifying yields the following 
expression for the ESS value of d(. 

< = (S/M)(»/2 - i){ln4i - ln(l - <?01 [18) 

This expression says that when an equal number of models use each behavior 
(i = n/ 2) ; individuals should ignore their models and rely completely on indi- 
vidual learning. As the number of models exhibiting one behavior increases, 
di also increases linearly, and therefore the relative importance of individual 
learning declines. This effect becomes stronger as the frequency of the favored 
behavior in each habitat increases and as the size of the set of models increases. 
When nearly everyone in a given habitat uses the optimal trait and the set of 
models gives clear indication which behavior is more common in the local 
habitat, then you should adopt the alternative behavior only if the evidence from 
your own experience is very strong. On the other hand, if both behaviors are 
almost equally common in both habitats, the fact that one behavior is common 
among your models gives little information about which behavior is favored 
locally (especially if the number of models is small), and individuals should 
mainly rely on their own experience. 


Discussion 

The models presented in this chapter lead to three qualitative conclusions about 
the evolution of social learning. First, the adaptiveness of social learning depends 
on a trade-off. Increasing the importance of social learning increases fitness be- 
cause it allows a reduction in the error rate of individual learning. However, 
increasing the importance of social learning also decreases the ability of the 
population to track a variable environment. A heavy dependence on social 
learning relative to individual learning seems to be most adaptive when individual 
learning is error-prone and environments are predictable. Second, the models 
suggest that when individuals do depend on social learning in a variable envi- 
ronment, they should not imitate randomly chosen individuals. Rather, they 
should tend to imitate the more common behavior among their models. This 
result follows from the fact that the behaviors favored by selection in a particular 
environment will tend to be more common in that environment. Finally, the 
models presented here suggest that selection will favor a pattern of social learning 
in which individuals exposed to more variable sets of models rely more heavily on 
individual learning. Given that models are numerous and sampled at random 
from the population, a predominance of one behavior among the models indi- 
cates that the behavior is more common in the population from which the models 
were drawn and, therefore, likely to be adaptive. An even mix of behavior among 
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models indicates little about which behavior is common, especially if the number 
of models is small. Therefore, it may make sense to depend heavily on individual 
learning. 

The models presented in this chapter can be thought of as a generalization of 
statistical decision theory. Within the context of that body of theory, decision 
makers seek to choose the best decision from among a set of possibilities, given 
specified information about the relationship between alternative decisions and 
outcomes. While this information may be imperfect, its statistical properties are 
specified, and they are independent of the decisions made by others. Given these 
assumptions, it is possible to specify the best decision procedures by considering 
each decision maker in isolation. Social learning involves decision makers who 
use the behavior of others as part of the information on which they base their 
decisions. The behavior of others depends on the decisions those individuals 
made, and therefore their decision rules. To specify the best rules for social 
learning, one must determine how a given decision rule affects the distribution of 
observed behavior in a population of decision makers. The models presented 
here provide one simple example of how this might be done in the context of the 
evolution of social learning. 

The models presented here are very general and should apply to many sit- 
uations in which animals could get information about the environment by ob- 
serving conspecifics. The apparent rarity, or at least lack of sophistication, of 
social learning in species besides humans (Galef, 1988) is a considerable puzzle 
given our results. The adaptive properties of social learning present an array of 
fascinating theoretical and empirical problems. 
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2 Why Does Culture Increase 
Human Adaptability? 


Culture has made the human species a spectacular ecological suc- 
cess. Since the first appearance of tools and other evidences of culture in the 
archaeological record, the human species has expanded its range from part of 
Africa to the entire world, increased in numbers by many orders of magnitude, 
exterminated competitors and prey species, and radically altered the earth's biota. 

It is not clear, however, why culture improves human adaptability. There has 
been a lot written about this topic, often in the introductions to articles and books 
on other topics, but very little careful analysis. In previous work, we (e.g., Boyd 
and Richerson, 1985] suggested that social learning allows us to avoid the costs 
of individual learning. Learning is costly, and without social learning every- 
body would have to learn everything for themselves. Teaching, imitation, and 
other forms of social learning, we argued, allow us to acquire a vast store of useful 
knowledge without incurring the costs of discovering and testing this knowledge 
ourselves. Recently, however, Alan Rogers (1989] has shown that this argument 
is, at best, incomplete and, at worst, plain wrong. Using a mathematical model of 
the evolution of social learning, he showed that the fact that social learning allows 
individual organisms to avoid the costs of learning does not increase the ability of 
that species of organisms to adapt. In fact, in the long run, social learning has no 
effect at all on the evolving organism's average fitness. 

Here we have two goals: first, we argue that Rogers’s result is robust, not an 
artifact of the specific form of his model. To do this, we analyze two models that 
incorporate Rogers’s fundamental assumption that social learning allows in- 
dividuals to avoid the costs of individual learning, but incorporate quite different 
assumptions about how social learning works and how the environment varies. 
Because these models also show that social learning does not increase the average 
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fitness, we conclude that Rogers’s result is robust. Culture will not increase 
the ability of a population to adapt if its only benefit is to allow individuals to 
avoid learning costs. We then analyze two models of the evolution of social 
learning that incorporate different assumptions about the evolutionary benefit of 
social learning. They assume that social learning increases the fitness of in- 
dividuals who do not imitate by reducing the cost or increasing the accuracy of 
individual learning. In these models, culture does increase the average fitness of 
populations. 


Why Avoiding Learning Costs Does Not 
Increase Average Fitness 

Rogers’s Model 

Rogers’s conclusions are based on a mathematical model of the evolution of im- 
itation in a very simple hypothetical organism. These animals live in an envi- 
ronment that can be in one of two states; let’s call them wet and dry. The 
environment has a constant probability of switching from wet to dry each gen- 
eration, and the same probability of switching from dry to wet, which means that 
over the long run the environment is equally likely to be in each state. The 
probability of switching is a measure of the predictability of the environment. 
When this probability is high, knowing the state of the environment in one 
generation tells little about the state of the environment in the next generation. In 
contrast, when the probability of switching is low, the environment in the next 
generation is likely to be the same as the environment this generation. There are 
two behaviors available to the organism: one best in wet conditions and the other 
in dry conditions. There also are two genotypes — learners and imitators. Learners 
figure out whether the current environment is wet or dry and always adopt the 
appropriate behavior. However, the learning process is costly in that it reduces 
learners’ chances of survival or reproduction. Imitators simply pick a random 
individual from the population and copy it. Copying does not have any direct 
effect on survival or reproduction. Rogers then used some simple but clever 
mathematics to determine which genotype wins in the long run. 

The answer is surprising. The long-run outcome of evolution is always a 
mixture of learners and imitators in which both types have the same fitness as 
learners in a population in which there are no imitators. In other words, natural 
selection favors culture, but culture provides no benefit to the species. The 
organisms are no better off than they were without any imitation. 

To understand the logic of this result, think about the fitness of learners and 
imitators as the frequency of imitators changes. As shown in figure 2.1, when imi- 
tators are rare, they have higher fitness than learners. They are nearly certain 
to acquire the best behavior because the population is composed of almost all 
learners, and learners always acquire the right behavior. But imitators don’t suf- 
fer the cost of learning, so their fitness must be higher than learners. Thus, new 
mutations that give rise to copying will always be able to invade a population of 



DOES CULTURE INCREASE HU 


DAPTAB I LITY? 


37 


Average 

Fitness 



Frequency of Imitators 

Figure 2.1. The average fitness of learners and imitators as a function of the frequency 
of imitators in the population. The frequency of learners is one minus the frequency of 
imitators. This figure is redrawn from Rogers (1989). 


learners. On the other hand, when learners are rare, they have higher fitness than 
imitators. When there are very few learners, most of the imitators copy imitators 
who themselves copied imitators and so on. Because the environment changes 
periodically, this means that when learners are rare, imitators, in effect, choose 
behavior at random. In contrast, learners still acquire the best behavior. Thus, 
rare learners will be able to invade a population of imitators any time that the 
benefits of learning are sufficient to compensate for its costs. Because both types 
can increase when they are rare, the population will always be a mix of the two 
types. But only mixtures in which the two types have the same fitness can be 
stable long-run outcomes. Since the fitness of the learners is constant, it follows 
that the evolutionarily stable mix of learners and imitators has the same fitness as 
a population composed only of learners. 


Two Extensions of Rogers’s Model 

One might think that this paradoxical result is an artifact. After all, the model is 
very simple. Perhaps if we add just a little realism, the paradox would go away. 
But such is not the case. We show that as long as the only benefit of imitation is 
the avoidance of learning costs, then changing rules of cultural transmission, the 
nature of environmental variability, and the number of traits leaves Rogers’s 
basic result unchanged. 
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Spatially Varying Environment, More than Two 
Behaviors, Learning Errors 

Rogers's model assumes that the environment varies in time but not space, that 
there are only two behaviors, and that learners always acquire the correct be- 
havior. Each of these assumptions can be changed without changing the quali- 
tative result. 

Consider a model in which organisms live in an environment that consists of 
a large number of discrete islands, each with a different environment in which a 
different behavior is favored by natural selection. The populations on different 
islands are linked by migration of individuals from each island to all other islands. 
Thus, in this model the rate of migration measures the predictability of the 
environment. If migration rates are high, individuals' environments are unlikely 
to be similar to their parents'. If migration rates are low, most individuals live in 
environments just like the one their parents lived in. Learners engage in costly 
learning trials that usually allow them to acquire the locally optimal behavior but 
also sometimes lead to errors. As shown in Appendix 1, this model yields the 
same qualitative result as Rogers’s model. Imitation evolves but does not benefit 
the population in the long run. 


Imitators Can Detect Learners 

Unlike the simple organisms in Rogers's model, humans do not blindly imitate a 
randomly chosen individual. Rather, they often evaluate the behavior of many 
individuals and choose the one that seems best, a process we have labeled biased 
transmission (Boyd and Richerson, 1985). Once a beneficial innovation arises, 
biased transmission allows it to spread through a population without further 
individual learning. Thus, it seems plausible that if Rogers’s model were ex- 
tended to allow biased transmission, the average fitness of the population might 
increase. However, a little analysis shows that this intuition is wrong. 

Consider a model in which there are learners and imitators. As before, 
learners always acquire the currently favored behavior but at some cost. After 
learners learn, each imitator surveys the behavior of n individuals living in his 
social group. Imitators query each potential model to find out whether he ac- 
quired behavior by copying or by learning. If there is even a single learner in their 
group, imitators copy the learner and thereby acquire the behavior that is best in 
the current environment. If there are no learners, imitators copy a randomly 
chosen individual. This model allows imitators a great deal more information 
than Rogers’s model: they can imitate n others rather than one, and they don’t 
copy at random. However, as is shown in Appendix 2, the qualitative result is 
exactly the same — both types are present, and their long-run average fitness is 
the same as a pure population of learners. 


Why Rogers’s Result Is Robust 

As Rogers argued in his original article, his result is robust because it reveals a 
basic evolutionary property of social learning: the advantage that imitators get 
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from avoiding learning costs cannot increase fitness of a population because the 
frequency of imitators will increase until this advantage is exactly balanced by 
the disadvantage that imitators often acquire the wrong behavior. The funda- 
mental logic underlying Rogers’s result can be represented graphically as in figure 
2.1, which plots the expected fitness of learners and imitators as a function of 
the fraction of imitators in the population. The fitness of imitators declines as the 
frequency of imitators increases because the more imitators there are, the more 
poorly the population tracks the changing environment, the lower the frequency 
of adaptive behavior, and, therefore, the dumber it is to copy. Moreover, there 
always have to be some learners in the population, because a population con- 
sisting only of imitators behaves at random. Thus, the expected fitness of imi- 
tators and learners has to be the same at equilibrium. But the fitness of learners 
isn’t affected by the number of imitators. Thus, at equilibrium the average fitness 
of the population is the same as that of a population without culture. 


How Culture Can Increase Average Fitness 

Thinking about the problem this way points to its solution. Social learning would 
improve the average fitness of a population if it increased the fitness of learners as 
well as imitators. Consider figure 2.2. Here, we assume that the average fitness 
of learners increases as the frequency of imitators increases, and the paradox 
disappears — learners and imitators still have the same fitness at equilibrium, but 


Average 

Fitness 



Figure 2.2. If increasing the frequency of imitators reduces the cost or increases the 
accuracy of individual learning, then the average fitness of the population can be increased 
by imitation. 
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now that fitness is higher than for a population composed entirely of learners. 
Thus, to improve the average fitness of the population, imitation must make 
individual learning cheaper or more accurate. 

Of course, this formal possibility would be of little importance if there were 
no plausible means by which increasing the amount of imitation would cause in- 
dividual learning to be more efficient. However, we suggest that there are at least 
two ways that imitation can benefit learners. 

Imitation Allows Selective Learning 

Imitation can increase the average fitness of learners by allowing individuals to 
learn more selectively. Learning opportunities often vary. Sometimes it may be 
easy to determine the best behavior while other times it may be very difficult. 
Without imitation, an organism must rely on learning even when it is difficult 
and error-prone. In contrast, an imitating organism can learn when learning is 
cheap and accurate and imitate when it is costly or inaccurate. The following 
model shows that imitation plus selective learning can increase average fitness in 
a population even when most individuals imitate. 

As before, consider a population that lives in an environment that switches 
between two states, and assume that there are two behaviors, one best in each 
environmental state. However, now suppose that all individuals attempt to dis- 
cover the best behavior in the current environment. Each individual experiments 
with both behaviors and then compares the results. The results of such experi- 
ments vary for many reasons, and, thus, the behavior that is best during any par- 
ticular trial may be inferior over the long run. To avoid errors, individuals adopt a 
particular behavior only if it appears sufficiently better than its alternative. The 
larger the observed difference in the payoffs between the two behaviors, the more 
likely that the behavior with the higher payoff actually is best. By insisting on a 
large difference in observed payoff, individuals can reduce the chance that they 
will mistakenly adopt the inferior behavior. Of course, being selective will also 
cause more trials to be indecisive, and, in that case, they imitate a randomly cho- 
sen individual. Thus, there is a tradeoff: You can increase the accuracy of learn- 
ing, but only by also increasing the probability that learning will be indecisive, and 
you will have to rely on imitation. The exact nature of the trade-off depends on 
the probability distribution of the outcome of learning trials. In Appendix 3, we 
analyze a model in which the observed difference in payoffs is a normal ran- 
dom variable. For one set of parameters (^ = 0.5, n= 1], the relationship be- 
tween imitation and the accuracy of learning has the form shown in figure 2.3. 
If the individual adopts a behavior any time that it yields a higher payoff dur- 
ing the learning trial, it will acquire the wrong behavior around 30 percent of 
the time. If it requires a larger difference in payoffs, then it can reduce the chance 
of such errors, but sometimes it will have to imitate. If it is sufficiently picky, 
it will almost never err, but it will also almost always acquire its behavior by 
imitation. 

To model the evolution of social learning, we assume that an individual’s 
position on this continuum is a genetically heritable trait. Suppose that most in- 
dividuals use a learning rule that causes them to imitate x percent of the time — we 
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Figure 2.3. The trade-off between imitation and learning, assuming that the outcomes of 
learning trials are normally distributed, with mean equal to 0.5 and variance equal to 1.0. 


call these “common-type individuals.” There are also a few rare “mutant” in- 
dividuals who imitate slightly more often. Compared to the common type, 
mutants are less likely to make learning errors. Thus, when mutants learn, they 
have higher fitness than the common-type individuals when they learn. When 
mutants imitate, they have the same fitness as the common type. However, mu- 
tants must imitate more often, and imitators always have lower fitness than 
learners. To see why, think of each imitator as being connected to a learner by a 
chain of imitation. If the learner at the end of the chain learned in the current 
environment, then the imitator has the same chance of acquiring the favored 
behavior as does a learner. If the learner at the end of the chain learned in a 
different environment, the imitator will have a lower chance of acquiring the best 
behavior. Thus, the mutant type will have higher fitness if the advantage of 
making fewer learning errors is sufficient to offset the disadvantage of imitating 
more. 

This evolutionary trade-off depends on how much the common type imi- 
tates. When the common type rarely imitates, the fitnesses of individuals who 
imitate and individuals who learn will be similar because most imitators will imi- 
tate somebody who learned, and, therefore, the fact that mutants make fewer 
learning errors will allow them to invade. However, as the amount of imitation 
increases, the fitness of imitating individuals relative to those who learn declines 
because increased imitation lengthens the chain connecting each imitator to a 
learner. Eventually an equilibrium is reached at which the common type can 
resist invasion by mutants that change the rate of imitation. We refer to the 
fraction of time that the common type imitates at equilibrium as the “evolu- 
tionary equilibrium amount of imitation.” 
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Figure 2.4. Individuals either 
learn or imitate according to 
the outcome of their learning 
trial. As individuals become 
more selective, the frequency 
of imitating individuals in- 
creases. This figure plots the 
expected fitness of individuals 
who imitate and those who 
learn as a function of the 
frequency of imitating in- 
dividuals, assuming the out- 
come of learning experiments 
is normally distributed with 
mean 0.5 and variance 1. 


Expected 

Fitness 



The average fitness of a population at the evolutionary equilibrium is greater 
than the average fitness of individuals who do not imitate as long as the prob- 
ability that the environment changes is less than half (see Appendix 3 for a 
formal proof]. You can get an intuitive feel for why by considering figure 2.4, 
which plots the average fitness of imitating and learning individuals as a function 
of the fraction of common-type individuals who imitate. The fitness of learning 
individuals increases as the amount of imitation increases because learners make 
fewer errors. The fitness of imitating individuals also increases at first because 
they are imitating learners who make fewer errors. If imitation is common en- 
ough, fitness eventually declines because the population fails to track the changing 
environment. The first effect is apparently sufficient to lead to a net increase in 
average fitness at evolutionary equilibrium. 

It is important to understand that this increase in average fitness is only a 
side effect of selection at the individual level. The evolutionary equilibrium 
amount of imitation does not maximize the average fitness of the population. 
Selection at the individual level favors more imitation than is optimal for the 
population because it ignores the effect on the population as a whole of in- 
creased imitation, and after a certain point this effect is deleterious. 

Imitation Allows Cumulative Improvement 

Imitation may increase the average fitness of learners by allowing learned im- 
provements to accumulate from one generation to the next. So far we have 
considered only two alternative behaviors. Thus, learning is an either/or propo- 
sition. Many kinds of behaviors admit successive improvements toward some 
optimum. Individuals start with some initial “guess” about the best behavior and 
then invest time and effort at improving their performance. For a given amount 
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of time and effort, the better an individual's initial guess, the better on average its 
final performance. Now, imagine that the environment varies, so that different 
behaviors are optimal in different environments. Organisms who cannot imitate 
must start with whatever initial guess is provided by their genotype. They can 
then learn and improve their behavior. However, when they die, these im- 
provements die with them, and their offspring must begin again at the genetically 
given initial guess. In contrast, an imitator can acquire its parents' behavior after 
their behavior has been improved by learning. Therefore, it will start its search 
closer to optimal behavior, and for a given amount of searching, it will achieve a 
better adult phenotype. Thus, if the learning cost per unit improvement is smaller 
for small improvements than for big ones, imitation makes learning more efficient 
and therefore increases the average fitness of the population. 

The following simple model illustrates this idea (a more realistic model with 
the same properties is analyzed in Boyd and Richerson, 1985, ch. 4], Consider an 
organism that lives in an environment that can be in a continuum of states. For 
example, suppose that the population density of prey species varies. In each 
generation there is a chance that the environment switches to a new state (more 
or less prey], but also some chance that it remains unchanged. There is also a 
continuum of behaviors, such as the amount of effort devoted to foraging versus 
hunting. We measure the environmental state in terms of the optimal behavior 
in that environment and assume that an individual’s fitness decreases as the dif- 
ference between the environmental state and its behavior value increases. 

All individuals modify their behavior by learning. Each individual begins with 
an initial guess about the state of the environment and then experimentally modifies 
this behavior. In doing so, individuals reduce the difference between their behavior 
and the optimum behavior in the current environment. Learning is costly — 
individuals who devote more time and effort to experimenting suffer greater 
learning costs but move closer to the current optimum. There are two genotypes. 
Learners use a fixed, genetically inherited norm of reaction as their initial guess 
about the environment, and they always acquire the optimum behavior. Imitators ac- 
quire their initial guess by imitating the behavior of a randomly chosen member of 
the previous generation. They invest much less in learning than do learners and, as a 
result, improve on their initial behavior only a small amount. However, as long as 
the environment does not change, the population of imitators will converge slowly 
toward the optimum as each generation moves toward the optimum. Thus, imi- 
tators may start their learning nearer to the optimum than do learners. 

Imitators have higher fitness at evolutionary equilibrium in this model as 
long as [1] the environment does not change too often compared to the rate at 
which the population of imitators converges toward the optimum, and (2) 
learners suffer substantially greater learning costs than do imitators. If the envi- 
ronment changes slowly enough, the gradual cumulative improvement achieved 
by imitators will be sufficient to ensure that their behavior is near the current 
optimum most of the time. Of course, imitators will never track the environ- 
ment as accurately as learners, but if the small improvements realized by imi- 
tators are cheaper than the large improvements of learners, imitators will have 
higher average fitness. Because only imitators are present at such an equilibrium, 
imitation increases average fitness. 
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Discussion 

Culture increases average fitness if it makes the learning processes that generate 
new knowledge less costly or more accurate. Culture may do this in at least two 
ways: first, social learning allows individual learning to be selective. Individuals 
can learn opportunistically when it is likely to be more accurate or less costly and 
imitate when conditions are less favorable. Second, social learning allows learned 
improvements to accumulate from one generation to the next. When learning 
in small steps is less costly per unit improvement in fitness than learning in 
large steps, the cumulative learning over many generations can increase average 

These results help us understand the importance of the evolution of true 
imitation. There are a number of examples of social traditions in other animals. 
For example, some populations of chimpanzees in West Africa regularly use 
stone tools to crack open tough nuts, while other nearby populations never use 
stones to crack nuts. The stones and nuts are available to both populations, and 
the environments are otherwise very similar (Boesch et ah, 1994]. Students of 
social learning in nonhuman animals (e.g., Galef, 1988; Visalberghi and Fragaszy, 
1990] distinguish two classes of processes that could maintain such cultural 
differences between different populations: social enhancement occurs when the 
activity of older animals increases the chance that younger animals will learn the 
behavior on their own. Young individuals do not acquire the behavior by ob- 
serving older individuals. Social facilitation could cause tool use to persist in 
some populations but not others, as in the following scenario: in populations in 
which chimpanzees use tools to crack nuts, young chimpanzees spend a lot of 
time in proximity to both nuts and hammer stones. Nuts are a greatly desired 
food, and young chimpanzees find eating nutmeats highly reinforcing. Young 
chimpanzees experiment with the hammers and anvils until they master the skill 
of opening the nuts. In populations in which chimpanzees do not use stones to 
open nuts, young chimpanzees never spend enough time in proximity to both 
nuts and hammer stones to acquire the skill. Imitation occurs when younger 
animals observe the behavior of older animals and learn how to perform the 
behavior by watching them. In this case, the tradition is preserved because young 
chimpanzees actually imitate the behavior of older chimpanzees. 

Students of animal social learning have distinguished between social en- 
hancement and imitation because the necessary psychological mechanisms are 
quite different. Our results suggest that this distinction is also of evolutionary 
importance because selective social learning and cumulative culture change are 
possible only when there is imitation. Social enhancement can preserve variation 
only in behavior that organisms can learn on their own, albeit in favorable cir- 
cumstances, but it does not allow individuals to avoid learning when information 
is poor or costly. Even more important, only imitation allows cumulative cul- 
tural change. Suppose that on her own in especially favorable circumstances an 
early hominid learned to strike rocks together to make useful flakes. ITer com- 
panions, who spent time near her, would be exposed to the same kinds of 
conditions, and some of them might learn to makes flakes too, entirely on their 
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own. This behavior could be preserved by social enhancement because groups in 
which tools were used would spend more time in proximity to the appropriate 
stones. However, that would be as far as it would go. Even if an especially tal- 
ented individual found a way to improve the flakes, this innovation would not 
spread to other members of the group because each individual learns the behavior 
anew. With imitation, on the other hand, innovations can persist as long as 
younger individuals are able to acquire the modified behavior by observational 
learning. As a result, imitation can lead to the cumulative evolution of behaviors 
that no single individual could invent on his own. 

Recent reviews (Galef, 1992; Tomasello, 1990; Visalberghi and Fragaszy, 
1 990) suggest that all known cases of animal social traditions can be explained as 
the result of social enhancement. If this is correct, our results explain why animal 
cultures seem to play such a small role in the lives of such species. It also suggests 
that understanding the evolution of the psychological mechanisms that allow 
imitation is of key importance for understanding human evolution. 


APPENDIX i: Spatially Varying Environment, More than 
Two Variants, Learning Errors 


Consider an organism that lives in a spatially varying environment in which there are 
a large number of islands. A different behavior is favored on each island so that the 
fitness of behavior i on island j is 


There are 


W { | 


Wo + D 
W 0 -D 


in environment i 
in environment j 


genotypes: 


(All) 


Learners = Discover locally optimal behavior with probability 1 — e. 
Imitators = Imitate a randomly chosen individual from the previous 
generation. 

After learning and imitating, a fraction m of the individuals on each island emigrate 
and are replaced by individuals drawn from all other islands at random. Because the 
number of behaviors is large, the frequency of the favored behavior among immi- 
grants is approximately zero. 

After migration, selection occurs. We assume that selection is weak so that the 
frequency of innovators and imitators is the same on all islands. Then let 


q = frequency of imitators on the focal island. 
p = frequency of the locally favored behavior among imitators. 

The probability that an imitator encounters a single individual who has the 
locally optimal behavior is (1 —q)(l — e) + qp, and thus the frequency of the locally 
optimal trait among imitators after imitation, p 1 , is 

p' = (l-q)(l- e ) +qp (A1.2) 

And after migration the frequency of the favored behavior among imitators, p" , is 
P" = (1 - m)[{\ - q){\ -e) + pq] (A1 .3) 
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Thus, there is a unique stable equilibrium frequency of the locally favored variant, p 


(1-mXl-eXl-g) 
P l-q(l-m) 


(A 1.4) 


The average htness of learners is W L = W 0 + D(1 — 2e) — C, where C is the cost of 
individual learning. The average fitness of imitators, Wj = Wo + D(2p — 1). If imitators 
are rare [q & 0), then the equilibrium frequency of the favored variant among rare 
imitators is approximately (1 — e)(l — m ), the same frequency as among learners, and 
since imitators incur no learning cost, they increase in frequency. If imitators are com- 
mon (q ~ 1), then the equilibrium frequency of the favored variant is zero, and therefore 
imitators have lower fitness than do learners as long as learning pays [D(l — 2e) > C], 
Since p is a monotonically decreasing function of q, there is a unique stable equilibrium 
value of q at which imitators have the same fitness as learners. 


APPENDIX 2: Imitators Can Identify Learners 

Consider an organism that lives in an environment that can be in one of two states. 
Each generation there is a probability y that the environment switches from one state 
to the other. There are two behaviors with fitnesses as given in table A2.1: 


Table A2.1. Fitness in environments l & 2 



Environment 1 

Environment 2 

Behavior 1 

IV 0 - D 

W 0 -D 

Behavior 2 

W 0 -D 

W 0 + D 


There are two genotypes: 

Learners = Always acquired the best behavior in the current environ- 
ment but at a cost C. 

Imitators = Observe n individuals after learning. If there is a learner 
among these individuals, imitators acquire the best behavior 
in the current environment. Otherwise they copy a random 
individual from within the group. 

And let q equal the frequency of imitators, and p the frequency of the currently 
favored behavior among imitators. Assume that selection is sufficiently weak so 
that the effect of selection on cultural evolution can be ignored (i.e., on dynamics of 
p ), and genetic evolution (the dynamics of q) responds to the stationary distribution 
of p. 

Then the frequency of the currently favored behavior after learning and imita- 

, J I - q n + q n p if no environmental change 

P | 1 — q n + q”(l — p) if environment changes (A2.1) 

Suppose at some time t the probability density for p is f t (p) with mean P t . Then the 
mean of /, + i [p) given by 
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Pt + 1 = / [(1 - y)n ~<f + <fp) + y(i - <f + «"0 - (A2.2J 

where y is the probability that the environment switches states. Integrating and 
simplifying yields the following recursion for P t : 

P t+l = 1 -q n + «?"[( 1 - 2 y)P t + y] (A2.3) 

Thus, the equilibrium value of mean frequency of the favored behavior is: 


1 - <f + q n y 


(A2.4) 


The average fitness of learners is Wl = Wo + D — C, which is independent of changes 
in the environment. The average fitness of imitators once P t has reached its equi- 
librium value is Wi=Wo~ D(2P — 1). The frequency of imitators will increase 
whenever Wj > Wl- Substituting the expression for P given in equation A2.4 and 
solving for q yields the following inequality: 


f y 

l2y(l - C/D) + C/Dj 


(A2.5) 


Thus, q* is a unique stable equilibrium value for the frequency of imitators, and at 
this frequency the average fitness of imitators and learners is equal. 


APPENDIX 3: Selective Learning 

Consider an organism that lives in an environment that can be in one of two states. 
Each generation there is a probability y that the environment switches from one state 
to the other. There are two behaviors with fitnesses as given in the table A2.1. 

Each individual performs a learning trial in which it estimates the payoff of 
each behavior in the current environment. The difference between the payoff of the 
currently favored behavior and that of the alternative behavior observed by each 
individual is an independent, normally distributed, random variable, x, with mean 
equal to m, and variance equal to 1 . The mean, m, is positive because, on average, the 
currently favored behavior yields a higher payoff in the current environment. All 
individuals use the learning rule: 


Outcome of 

Decision 

Learning Trial 


x>d 

Adopt favored behavior 

d>x>—d 

Imitate 

— d > x 

Adopt other behavior 


The threshold parameter d determines how selectively individuals learn. In- 
dividuals regard trials that yield positive outcomes greater than d as decisive evidence 
that the environment is in the state that is currently favored, and trials in which x is 
less than — d as decisive evidence that the environment is in the other state. When a 
trial produces an outcome in between d and — d, it is indecisive and individuals imitate. 

The value of d is a genetically heritable trait. At any time there are two genotypes 
present in the population. Most of the population has d = d*, but there are a very few 
rare mutants who have d=d* + dd. We seek to determine the values of d* that can resist 
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invasion by mutants with slightly smaller or slightly larger values of d. Such continuous 
ESS solutions often yield the same outcome as genetically more realistic models. 

Let v be the frequency of the favored behavior in the population. Assume that 
selection is sufficiently weak so that the effect of selection on cultural evolution can 
be ignored (i.e., on dynamics of v) and genetic evolution responds to the stationary 
distribution of v. Finally, let pi (d) = Pr(x > d), p 2 {d) = Pr(x < —d), and L{d) = 
1 pi (d) — p 2 [d); pi (d) is the probability of correctly choosing the currently favored 
behavior, p 2 [d) is the probability of mistakenly choosing the other behavior, and L{d) 
is the probability of imitating. Then the frequency of the favored variant in the next 
generation, 1 / , is: 

, | vL(d *) + pi (d*) if no change in environment ^ 

1 (1 — v)L(d *) + p2 [d*] if environment changes 

Suppose at some time t the probability density for v is f t (v) with mean V t . Then 
the mean of f t+ i(v) is given by 

V»+i = / [(1 - y)(vL + pO + y((l - v)L + p 2 )]f t [v)dv [A3. 2) 

Integrating and simplifying yield the following recursion for V t : 

V t+ \ = (1 — 2y)[V t L + pi) + y (A3.3) 

Thus, the equilibrium value of the mean frequency of the favored behavior is: 


(1 - 2y)pi + y 
(1 - 2y)(pi + P2 ) + 2y 


(A3 .4) 


The htness of the common genotype averaged over the stationary distribution of v is: 


W(_d*) = W 0 + D[VL(d*) + pi (_d*~) ] - D[(l - V)L(d*) + p 2 (d *)] (A3.5) 

and the fitness of the mutant type is 

W{d* + dd) = W 0 + D[VT(d* + dd) + pi{d* + 5d)] 

- D[(l - V)L[d* + dd) + p 2 (d * + dd)] (A3. 6) 

Thus, because dd is small, the difference in fitness between the mutant and common 
types, dW, is 


sw -4 2v - i iod)/ d + [t)M%A (A3 - 7) 

Setting dW = 0, substituting the expression for V given in A3 .4, and simplifying yield 
the following necessary condition for the ESS: 




~\dd\/ 


Given that x is normal with a known mean and variance, this equation can be solved 
numerically for the value of d*. 

We now prove that the average fitness of a population at the ESS value of d, 
d* , is greater than the average fitness of a population with no imitation (i.e., d = 0) 
whenever m> 0 and y < 1/2. It follows from A3. 8 that when y= 1/2 then d* = 0 
and, therefore, that W{d*) — W(0) = 0. Next, we show that W [d*] — W (0) is a 
monotonically decreasing function of y as long as m is positive. Compute 
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- wm-2% lot + a ^{pv- «(g), + [%\ - (^y 

(A3. 9) 

But the ESS condition (A3. 7) guarantees that the term in braces on the right-hand 
side of A3. 9 is zero. Thus, 

| md*) - mo)) a ^=P 2 [d*) - PI (d*) 

dd* ( t dpi dp 2 1 

+ (1 - 2 '%Ww-wL. 

+ v- 2 Ati t >-fd p ' U (A3io) 

Once again the ESS condition guarantees that the term in braces on the right-hand 
side of A3. 10 is zero, and since p\ (d*) >pz[d*} for m > 0, it follows that the average 
fitness of an ESS population is greater than the fitness of a population with no 
imitation as long as y < 


APPENDIX 4: Cumulative Learning 

Consider an organism that lives in an environment that can be in a continuum of 
states. Each generation, there is a probability 7 that the environment switches from 
its current state to a new state drawn at random from a probability distribution with 
mean equal to zero and variance equal to H. There is a probability 1 — y that the en- 
vironment will remain unchanged. There is also a continuum of behaviors. In each 
environment, fitness is a gaussian function of behavior so that there is a unique 
optimum behavior 6 t . We choose to measure the state of the environment as the 
optimal behavior in that environment. All individuals modify their behavior by 
learning so that the difference between their behavior and optimum behavior in the 
current environment is reduced. There are two genotypes: 

Learners = Acquire the optimal behavior. Learning costs reduce fitness 
by a factor e c '\ 

Imitators = Imitate a randomly chosen individual from the previous 
generation, and then adjust their behavior a small fraction, 
a (a<l] by learning. Learning costs reduce fitness by a 
factor e Q . 

Suppose most individuals in a population are imitators, but that there are a small 
number of rare learners. Because they always acquire the optimal behavior, the 
expected fitness of learners is simply: 

W L = exp (1 - Cl) (A4.1) 

and the expected fitness of copiers is: 

W, = exp [ - (1 - a) 2 (Z t - d t f - Q] (A4.2) 

where Z t is the behavior of imitators during period f, which will change from period 
to period according to the following recursion. 

Z t+ i=a0, + (1 -a)Z, 


(A4.3) 
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Thus, the behavior of imitators will converge toward the current optimum at a rate a. 
When the environment changes, it will converge toward a different value. Assume 
that selection is weak enough that changes in gene frequency respond to the sta- 
tionary distribution of Z t . Thus, imitation is evolutionarily stable if 

-(1 - a) 2 E{(Z t - O t ) 2 } - Q > - Ci (A4.4) 

where the expectation is taken with respect to the joint stationary distribution of 9 t 
and Z t . 


E[(Z t - 0 t ) 2 } = E{Z 2 } - 2E[Z t 0 t } + E{0 2 } (A4.5) 

To compute E{Z t 6 t ] multiply both sides of A4.3 by O t+ \. 

0 , . |Z,gj[’§. ae t e t+ 1 + (1 - a)Z,e t +\ (A4.6) 

Taking the expectation of both sides yields: 


E{O t+ iZ t+ i} = a[{l - y)V + yO] + (1 - a][(l - y)E{6 t Z t ] + yO] (A4.7) 

The moments of the stationary distribution are constant, and thus setting 
E{Z t+1 6 t+l } = E{Z t 0 t } and solving yields: 

^■t- , -0-~»xL rt ‘ A48 > 

To compute E[Z 2 } square both sides of A4.3. 

Z t 2 + ] = a 2 0 2 + 2a{\ - a)Z t 0, + (1 - a) 2 Z 2 (A4.9) 

Again taking the expectation of both sides, setting E{Z 2 + j } = E{Z 2 }, and substituting 
the expression for £{Z,0,} yields: 


Ff7 2 , 41 + Ci-aXi-y)] 

1 (2-«)[l-Cl-«Xl-y)] 

Substituting the expressions for E{Z,0,} and E{Zj] into A4.5 and 5 


(A4.10] 

mplifying yields: 


4(Zt — Of) 2 } = 


2yV 

P -4[i-(i -4(1-7)] 


Substituting this expression into A4.4, ignoring terms of order 
yield the following condition for imitation to be an ESS. 


(A4.11J 
a 2 , and simplifying 


(r^r) a>? tA412) 

where 5 = C ' V C ' is the fitness advantage of imitators due to lower cost learning 
measured in units of V the average log fitness increase of learners due to learning. 
Because learning would not be favored by selection for learners if V < Cl, we know 
that 8 < 1 . Recall that a is the rate at which imitators converge toward the current 
optimum. Thus, the ESS condition, A4.12, says that the rate of environmental 
change must be less than the rate at which imitators converge toward the current 
optimum as modified by the term in parentheses. This term is greater than one when 
the learning cost advantage of imitators is a large fraction of the total benefit of 
learning and less than one when the learning cost advantage of imitators is relatively 
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Why Culture Is Common, but 
Cultural Evolution Is Rare 


Cultural variation is common in nature. In creatures as diverse as 
rats, pigeons, chimpanzees, and octopuses, behavior is acquired through social 
learning. As a result, the presence of a particular behavior in a population makes 
it more likely that individuals in the next generation will acquire the same 
behavior, which, in turn, results in persistent differences between populations 
that are not due to genetic or environmental differences. 

In sharp contrast, cumulative cultural evolution is rare. Most culture in 
nonhuman animals involves behaviors that individuals can, and do, learn on their 
own. There are only a few well-documented cases in which cultural change ac- 
cumulates over many generations leading to the evolution of behaviors that 
no individual could invent — the only well-documented examples are song dialects 
in birds, perhaps some behaviors in chimpanzees, and, of course, many aspects of 
human behavior. 

We believe that this situation presents an important evolutionary puzzle. 
The ability to accumulate socially learned behaviors over many generations has 
allowed humans to develop subtle, powerful technologies and to assemble com- 
plex institutions that permit us to live in larger, and more complex, societies 
than any other mammal species. These accumulated cultural traditions allow us 
to exploit a far wider range of habitats than any other animal, so that even with 
only hunting and gathering technology, humans became the most widespread 
mammal on earth. The fact that simple forms of cultural variation exist in a wide 
variety of organisms suggests that intelligence and social life alone are not suf- 
ficient to allow cumulative cultural evolution. Cumulative cultural change seems 
to require some special, derived, probably psychological, capacity. Thus, we have 
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the puzzle, if cultural traditions are such a potent means of adaptation, why is 
this capacity rare? 

In this chapter we suggest one possible answer to this question. We begin by 
reviewing the literature on animal social learning. We then analyze two models 
of the evolution of the psychological capacities that allow cumulative cultural 
evolution. The results of these models suggest a possible reason why such ca- 
pacities are rare. 


Culture in Other Animals 

There has been much debate about whether other animals have culture. Some 
authors define culture in human terms. That is, the investigator essays human 
cultural behavior and extracts a number of “essential” features. For example 
Tomasello, Kruger and Ratner (1993) argue that culture is learned by all group 
members, faithfully transmitted, and subject to cumulative change. Then to be 
cultural, the behavior of other animals must exhibit these features. Moreover, 
a heavy burden of proof is placed on those who would claim culture for other 
animals — if there is any other plausible interpretation, it is preferable. Others 
(McGrew, 1992; Boesch, 1993) argue that a double standard is being applied. 
If the behavioral variation observed among chimpanzee populations were instead 
observed among human populations, they argue, anthropologists would regard 
it as cultural. 

Such debates make little sense from an evolutionary perspective. The psy- 
chological capacities that underpin human culture must have homologies in the 
brains of other primates and perhaps other mammals as well. Moreover, the 
functional significance of social transmission in humans could well be related to 
its functional significance in other species. The study of the evolution of human 
culture must be based on categories that allow human cultural behavior to be 
compared to potentially homologous, functionally related behavior of other or- 
ganisms. At the same time, such categories should be able to distinguish between 
human behavior and the behavior of other organisms because it is quite plausible 
that human culture is different in important ways from related behavior in other 
species. 

Here we define cultural variation as differences among individuals that exist 
because they have acquired different behavior as a result of some form of social 
learning. Cultural variation is contrasted with genetic variation, differences among 
individuals that exist because they have inherited different genes from their par- 
ents, and environmental variation, differences among individuals due to the fact 
that they have experienced different environments. Cultural variation is often 
lumped together with environmental variation. However, as we have argued at 
length elsewhere (Boyd and Richerson, 1985), this is an error. Because cultural 
variation is transmitted from individual to individual, it is subject to population 
dynamic processes analogous to those that effect genetic variation and quite unlike 
the processes that govern other environmental effects. Combining cultural and 
environmental effects into a single category conceals these important differences. 
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There is much evidence that cultural variation, defined this way, is very 
common in nature. In a review of social transmission of foraging behavior, Le- 
vebre and Palameta (1988] give 97 examples of cultural variation in foraging 
behavior in animals as diverse as baboons, sparrows, lizards, and fish. Song dia- 
lects are socially transmitted in many species of songbirds. Three decades of study 
shows that chimpanzees have cultural variation in subsistence techniques, tool 
use, and social behavior (Wrangham, McGrew, DeWaal, and Heltne, 1994; 
McGrew, 1992], 

There is little evidence, however, of cumulative cultural evolution in other 
species. With a few exceptions, social learning leads to the spread of behaviors 
that individuals could have learned on their own. For example, food preferences 
are socially transmitted in rats. Young rats acquire a preference for a food when 
they smell the food on the pelage of other rats (Galef, 1988], This process can 
cause the preference for a new food to spread within a population. It can also 
lead to behavioral differences among populations living in the same environment 
because current foraging behavior depends on a history of social learning. How- 
ever, it does not lead to the cumulative evolution of new, complex behaviors 
that no individual rat could learn on its own. 

In contrast, human cultures do accumulate changes over many generations, 
resulting in culturally transmitted behaviors that no single human individual 
could invent on his own. Even in the simplest hunting and gathering societies, 
people depend on such complex, evolved knowledge and technology. To live in 
the arid Kalahari, the IKung San need to know what plants are edible, how to find 
them during different seasons, how to find water, how to track and find game, 
how to make bows and arrow poison, and many other skills. The fact that the 
IKung can acquire the knowledge, tools, and skills necessary to survive the rigors 
of the Kalahari is not so surprising — many other species can do the same. What is 
amazing is that the same brain that allows the IKung to survive in the Kalahari 
also permits the Inuit to acquire the very different knowledge, tools, and skills 
necessary to live on the tundra and ice north of the Arctic circle, and the Ache the 
knowledge, tools, and skills necessary to live in the tropical forests of Paraguay. 
No other animal occupies a comparable range of habitats or utilizes a comparable 
range of subsistence techniques and social structures. Two kinds of evidence in- 
dicate that such differences result from cumulative cultural evolution of complex 
traditions. First, such gradual change is documented in both the historical and 
archaeological records. Second, cumulative change leads to a branching pattern of 
descent with modification in which more closely related populations share more 
derived characters than distantly related populations. Although the possibility of 
horizontal transmission among cultural lineages makes reconstructing such cul- 
tural phylogenies difficult for “cultures” (Boyd, Richerson, Borgerhoff Mulder, 
and Durham, 1997], patterns of cultural descent can be reconstructed for par- 
ticular cultural components, such as languages or technologies. 

Circumstantial evidence suggests that the ability to acquire novel behaviors 
by observation is essential for cumulative cultural change. Students of animal 
social learning distinguish observational learning or true imitation, which occurs 
when younger animals observe the behavior of older animals and learn how to 
perform a novel behavior by watching them, from a number of other mechanisms 
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of social transmission that also lead to behavioral continuity without observational 
learning (Galef, 1988; Visalberghi and Fragaszy, 1990; Whiten and Ham, 1992], 
One such mechanism, local enhancement, occurs when the activity of older animals 
increases the chance that younger animals will learn the behavior on their own. If 
younger, naive individuals are attracted to the locations in the environment where 
older, experienced individuals are active, they will tend to learn the same behav- 
iors as the older individuals. Young individuals do not acquire the information 
necessary to perform the behavior by observing older individuals. Instead, the 
activity of others causes them to be more likely to acquire this information 
through interaction with the environment. Imagine a young monkey acquiring its 
food preferences as it follows its mother around. Even if the young monkey never 
pays any attention to what its mother eats, she will lead it to locations where some 
foods are common and others rare, and the young monkey may learn to eat much 
the same foods as mom. 

Local enhancement and observational learning are similar in that they both 
can lead to persistent behavioral differences among populations, but only ob- 
servational learning allows cumulative cultural change (Tomasello et ah, 1993], 
To see why, consider the cultural transmission of stone tool use. Suppose that on 
their own in especially favorable circumstances, an occasional early hominid 
learned to strike rocks together to make useful flakes. Their companions, who 
spent time near them, would be exposed to the same kinds of conditions and 
some of them might learn to make flakes too, entirely on their own. This be- 
havior could be preserved by local enhancement because groups in which tools 
were used would spend more time in proximity to the appropriate stones. How- 
ever, that would be as far as it would go. Even if an especially talented individual 
found a way to improve the flakes, this innovation would not spread to other 
members of the group because each individual learned the behavior anew. Local 
enhancement is limited by the learning capabilities of individuals and the fact 
that each new learner must start from scratch. With observational learning, on 
the other hand, innovations can persist as long as younger individuals are able to 
acquire the modified behavior by observational learning. To the extent that ob- 
servers can use the behavior of models as a starting point, observational learning 
can lead to the cumulative evolution of behaviors that no single individual could 
invent on her own. 

Most students of animal social learning believe that observational learning is 
limited to humans and, perhaps, chimpanzees and some bird species. Several lines 
of evidence suggest that observational learning is not responsible for cultural 
traditions in other animals. First, many of the behaviors, like potato washing in 
Japanese macaques, are relatively simple and could be learned independently by 
individuals in each generation. Second, new behaviors like potato washing often 
take a long time to spread through the group, a pace more consistent with the idea 
that each individual had to learn the behavior on her own. Finally, extensive 
laboratory experiments capable of distinguishing observational learning from 
other forms of social transmission like local enhancement have usually failed 
to demonstrate observational learning (Galef, 1988; Whiten and Ham, 1992; 
Tomasello et ah, 1993; Visalberghi, 1993), except in humans and songbirds. (In 
many songbirds, song traditions are transmitted by imitation, but little or nothing 
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else is.) The fact that observational learning appears limited to humans seems to 
confirm that observational learning is necessary for cumulative cultural change. 
However, one must be cautious here because most students of animal social 
learning refuse to invoke observational learning unless all other possible expla- 
nations have been excluded. Thus, there actually may be many cases of obser- 
vational learning that are interpreted as social enhancement or some putatively 
simpler mechanism. A few well-controlled laboratory studies do apparently show 
some true imitation in nonhuman animals (Heyes, 1993; Dawson and Foss, 1965), 
and striking anecdotes suggest that observational learning may occur in organisms 
as diverse as parrots (Pepperberg, 1988) and orangutans (Russon and Galdikas, 
1993). 

Adaptation by cumulative cultural evolution is apparently not a by-product 
of intelligence and social life. Cebus monkeys are among the world’s cleverest 
creatures. In nature, they use tools and perform many complex behaviors, and 
in captivity, they can be taught extremely demanding tasks. Cebus monkeys 
live in social groups and have ample opportunity to observe the behavior of 
other individuals of their own species. Yet good laboratory evidence suggests that 
cebus monkeys make no use of observational learning. This suggests that ob- 
servational learning is not simply a by-product of intelligence and opportunity to 
observe conspecifics. Rather, observational learning seems to require special 
psychological mechanisms (Bandura, 1986). This conclusion suggests, in turn, 
that the psychological mechanisms that enable humans to learn by observation 
are adaptations that have been shaped by natural selection because culture is 
beneficial. Of course, this need not be the case. Observational learning could be a 
by-product of some other adaptation that is unique to humans, such as bipedal- 
ism, dependence on complex vocal communication, or the capacity for decep- 
tion. However, given the great importance of culture in human affairs, it is 
reasonable to think about the possible adaptive advantages of culture. In what 
follows we consider two mathematical models of the evolution of the capacity 
for observational learning based on this assumption. 


Models of the Evolution of Social Learning 

The maintenance of cultural variation involves two different processes (figure 
3.1). First, there must be some kind of transmission of information from one brain 
to another. Consider, for example, the maintenance of the use of a particular kind 
of tool. Individuals have information stored in their brain that allows them to 
manufacture and use the tool. For use of the tool to persist through time, ob- 
serving tool use and manufacture must cause individuals in the next “generation” 
to acquire information that allows them to manufacture and use the same tool. 
(We put generation in quotes because the same model can be used to represent 
culture change occurring on much shorter time scales. See Boyd and Richerson, 
1985: 68-69.) As we have seen, this transmission may occur because individuals 
can learn how to make and use tools by observation, or because observation 
stimulates them to learn on their own how to make and use the tool, for example 
by local enhancement. Second, individuals must preserve the information that 
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Figure 3.1. The maintenance of cultural transmission requires both the accurate 
transmission of mental representations from experienced to inexperienced individuals 
and the persistence of those representations through the lives of individuals until such 
time that they act as models for others. 


allows them to make and use the tool until such time that they serve as models for 
the next generation of individuals. Such persistence may fail to occur for two 
different reasons: individuals may forget how to make or use the tool, or they 
may, as a result of interacting with the environment, modify the information 
stored in their brains so that they make or use the tool in a significantly different 
way. Without both transmission and persistence, there can be no culturally 
transmitted variation. 

Our previous work on the evolution of culture [Boyd and Richerson 1985, 
1988, 1989, 1995) has focused on the evolution of persistence. All of the 
models analyzed in these studies assume that transmission occurs and consider 
the evolution of genes that affect the extent to which behavior acquired by 
imitation is modified by individual learning. They differ in how the trait is 
modelled (discrete vs. continuous), how environmental variation is modelled, 
whether individuals are sensitive to the number of models who exhibit a par- 
ticular cultural variant, and a number of other features. This work leads to the 
robust conclusion that natural selection will favor individuals who do not modify 
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culturally acquired behavior when individual learning is costly or error-prone, 
and environments are variable, but not too variable. Thus, natural selection can 
favor persistence. (See Rogers, 1989, for a related model.} 

In several articles, Feldman and his co-workers (Cavalli-Sforza and Feldman, 
1983a, 1983b; Aoki and Feldman, 1987} have considered the evolution of genes 
that affect transmission. In these models it is assumed that there is a beneficial 
trait that can be acquired only by cultural transmission, not by individual 
learning. They further allow for the possibility that successful transmission re- 
quires new behavior both on the part of the individual acquiring the behavior 
and in the individual modelling the behavior. Thus, there are two different 
genetic loci, one affecting the behavior of the transmitter and a second affecting 
the behavior of the receiver. For transmission to evolve, there must be substi- 
tutions at both loci. These models are very relevant to the evolution of com- 
munication systems. However, they cannot address the questions posed here 
because the culturally transmitted trait cannot be acquired or modified by in- 
dividual learning. 

Here we consider two models of the evolution of psychological capacities 
that allow the transmission of behavior that can be acquired or modified through 
individual learning. Each model is designed to answer the same basic question: 
what are the conditions under which selection can favor a costly psychological 
capacity that allows individuals to acquire behavior by imitation? The primary 
difference between the models is the nature of the culturally transmitted be- 
havior. In the first model, the behavior is discrete — individuals are either skilled 
or unskilled, and the skill can be acquired either by social or individual learning. 
In the second model, there is a continuum of behaviors subject to stabilizing 
selection. Only the continuous trait model allows true cumulative cultural change 
leading to behaviors that individuals cannot learn on their own. However, the 
discrete model allows us to investigate the effects of several factors that are dif- 
ficult to include in the continuous character model. As we will see, both models 
tell a similar story about why there is a selective barrier to the evolution of the 
capacity for observational learning and why capacities that allow local enhance- 
ment and related mechanisms do not face a similar barrier. 


Discrete Character Model 

Consider an organism that lives in a temporally variable environment that can be 
in an infinite number of states. In each state, individuals can acquire a skill that 
increases fitness, so that unskilled individuals have fitness Wo, and skilled in- 
dividuals have fitness W 0 + D. Each generation there is a probability y that the 
environment switches from its current state to a different state. When this oc- 
curs, the old skill is no longer useful in the new environment. 

There are two genotypes with different learning rules. Individual learners 
acquire the skill appropriate to the current environment with probability <5 at 
a cost Q. Social learners observe n randomly selected members of the previous 
generation. If there is a skilled individual among the n, an imitator acquires 
the skill at cost C s . Otherwise they acquire the skill with probability <5 at a cost 
Q. The ability to acquire the skill by social learning reduces the fitness of an 
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individual by an amount K. Thus, parameters C; and C s give the variable costs of 
individual and social learning, respectively, and K gives the fixed cost associated 
with the capacity for social learning. 

It is shown in the appendix that social learning can increase when rare and is 
the only ESS when the following condition holds: 

[i - (i - TO - r)[D( 1 - s) + q - c s ] > k (i) 

When expression (1) is true, social learning has higher fitness than individual 
learning no matter what the mix of the two types in the population. The term 
in square brackets gives the fitness benefit of acquiring the skill through social 
rather than individual learning — Q — Cs is the advantage that results from the 
fact that social learning may reduce the cost of acquiring the trait, and D( I — <5) 
is the advantage that results from being more likely to acquire the skill. Sensibly, 
the latter term implies that the fitness advantage of social learning increases as 
the likelihood that individuals will learn the trait on their own, <5, decreases. The 
less likely it is that individual learners will acquire the skill, the bigger the relative 
advantage that accrues to social learning. The fitness benefit is discounted by 
the two factors on the left-hand side of expression (1}. The term 1 — y expresses 
the fact that social learning is beneficial only if the environment has not changed, 
and term I — (1 — §) n gives the probability that at least one of the n individuals 
from the previous generation will have acquired the behavior when social 
learning is rare. Notice that this latter term decreases as the probability of learn- 
ing the trait decreases. Thus, the net advantage of social learning is highest at 
intermediate values of <5, when there is a good chance that individuals will learn 
the skill on their own, but also a good chance that they won’t. 

When (I) is not satisfied, there is a range of conditions in which social 
learning cannot increase when rare, but is an ESS once it becomes common. In 
this analysis we are limited to the case n = 1 because when n > 1 the dynamics of 
the cultural traits are nonlinear, and such systems are difficult to analyze in 
autocorrelated random environments. With this assumption, social learning is an 
ESS when: 

<5(1 .yj(D(l dJ-b-Ci Cs) 

7 + [i-y» >K p) 

To compare this expression with (1), notice that when n = 1 , 1 — (1 — <)}”= <), 
and, thus, the benefit of social learning when it is common is the benefit when 
rare divided by the term y + (1 — y)<5. When individual learners are likely to 
acquire the skill (so that 3 is large), the conditions for social learning to increase 
when rare (1) and to persist when common (2) will be similar. However, when 
individual learners are unlikely to acquire the skill (<5 -C 1) and the rate of envi- 
ronmental change is slow (y < 1), social learning will be able to persist when 
common under a much wider range of conditions than it can increase when it is 
rare. When social learning is rare, most of the population will be individual 
learners who have little chance of acquiring the skill. As a consequence, social 
learning will provide little benefit because there will be few skilled individuals to 
observe. When social learning is common, the population will slowly accumulate 
the skill over many generations. If the environment does not change too often, 
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the social learning population will spend most of the time with the skill at high 
frequency, and thus the cost of the capacity for social learning need be only less 
than the net benefit of acquiring the skill by individual learning. 

Continuous Character Model 

Consider an organism that is characterized by a single quantitative character 
subject to stabilizing selection. During generation t the optimum value of the 
quantitative character is 9 t . Each generation there is a probability y that the 
environment changes. If the environment does not change then 9 t+ \ = 9 t . If it 
does change, then 0 t , i is a normal random variable with mean 0, and variance 
H. Notice that this assumption implies that 0 is the long-run optimum trait 
value. 

Each individual acquires its trait value through a combination of genetic 
transmission, imitation, and individual learning. The adult trait value, x, is given by: 

%= (1 — a][(l — 0© + iy] + a0 t (3] 

The term (1 — i)0 + iy represents a “norm of reaction,” which forms the basis 
for subsequent individual learning. It is acquired as the result of a combination of 
a genetically acquired norm of reaction at the long-run optimum, 0, and the 
observed trait value, y, of a randomly selected member of the previous genera- 
tion. The parameter i governs the relative importance of genetic inheritance and 
imitation in determining the norm of reaction. When i = 0, the norm of reaction 
is completely determined by an innate, genetically inherited value. As i increases, 
the observed trait value of another individual has greater influence on the trait 
until, when i = 1 , the norm of reaction is completely determined by observa- 
tional learning. Because observational learning is assumed to require special- 
purpose cognitive machinery, individuals incur a fitness cost proportional to the 
importance of observational learning in determining their norm of reaction, iC. 
Thus, C measures the incremental cost of the capacity for observational learning. 
Individuals adjust their adult behavior from the norm of reaction toward the 
current optimum a fraction a. To capture the idea that cumulative change is 
possible, we assume that a is small, so that the repeated action of learning and 
social transmission can lead to fitness increases that could not be attained by 
individual learning. 

With these assumptions it is shown in the appendix that a population in 
which most individuals do not imitate can be invaded by rare individuals who 
imitate a little bit only if 

(1 ;)aH>C (4] 

The parameter H is a measure of how far the population is from the optimum 
in fitness units, on average, immediately after an environmental change. Since a 
population without imitation always starts from the same norm of reaction, 0, 
the term aH is a measure of the average fitness improvement due to individual 
learning in a single generation. Thus, (4] says that imitation can evolve only 
when the benefit of imitating what individuals can learn on their own is sufficient 
to compensate for the costs of the capacity to imitate. 
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In contrast, the condition for social learning to be maintained once it is 
common is much more easily satisfied. It is shown in the appendix that a pop- 
ulation in which i = 1 can resist invasion by rare alleles that reduce the reliance 
on imitation whenever: 


[1 ~ y)aH 
y+ (l-y)a 


[5) 


If the rate at which the population adapts by individual learning, a, is greater 
than the rate at which the environment changes, y, then a population in which 
social learning is common spends most of its time with the mean behavior near 
the optimum. Thus, (5) says that imitation is evolutionarily stable as long as the 
cost of the capacity is less than a substantial fraction of the total improvement in 
fitness due to many generations of social learning. 


Discussion 

Both of these models tell a similar story about the evolution of capacities that 
allow social learning. When social learning is rare, the only useful behavior that 
is present in the population, and thus the only behavior that can be acquired by 
social learning, is behavior that individuals can learn on their own. In contrast, 
when social learning is common, the population accumulates adaptive behavior 
over many generations, and, as long as the environment does not change faster 
than adaptive behavior accumulates, social learning allows individuals to acquire 
behaviors that are much more adaptive than they could acquire on their own. 

This result provides a potential explanation for why cultural variation is 
so common in nature but cumulative cultural evolution so rare. Capacities that 
increase the chance that individuals will learn behaviors that they could learn on 
their own will be favored as long as they are relatively cheap. On the other hand, 
even though the benefits of cumulative cultural evolution are potentially sub- 
stantial, selection cannot favor a capacity for observational learning when rare. 
Thus, unless observational learning substantially reduces the cost of individual 
learning, it will not increase because there is an “adaptive valley” that must be 
crossed before benefits of cumulative cultural change are realized. This argument 
suggests, in turn, that it is likely that the capacities that allow the initial evo- 
lution of observational learning must evolve as a side effect of some other adap- 
tive change. For example, it has been argued that observational learning requires 
that individuals have what psychologists and philosophers call a “theory of mind” 
(Cheney and Seyfarth, 1990; Tomasello et al., 1993). That is, imitators must be 
able to understand that others have different beliefs and goals from theirs. 
Lacking such a theory, typical animals cannot make a connection between the 
acts of other animals and their own goal states and thus can’t interpret the acts of 
other animals as acts they might usefully perform. A theory of mind may have 
initially evolved to allow individuals to better predict the behavior of other 
members of their social group. Once it had evolved for that reason, it could 
be elaborated because it allowed observational learning and cumulative cultural 
evolution. 
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APPENDIX i: Analysis of Discrete Character Model 

Individual learners always have the same fitness: 

W t -W a + 6D — Cj (Al.l) 

The expected fitness of social learners depends on the frequency of social learners in 
the previous generation, q, the frequency of skilled individuals among social learners, 
p, and whether the environment has changed during the previous generation. 

Ws = y(W 0 + 8D-C1) 

+ (1 - i) m + <D - C s ) + (1 - 71] (3D - Q)} (A1 .2) 

where n is the probability that at least one of the n individuals in the sample of 
models has acquired the skill favored in the previous environment, and can be cal- 
culated as: 


7t = EllW - a -pm-sr 1 ] (ai.3) 

i= o'- 1 

To understand this expression, assume that there are i social learners among the n 
models observed by a given, naive social learner. The probability that all i of the social 
learners are not skilled is (1 —p)', and the probability that the remaining n — i indi- 
vidual learners are not skilled, is (1 — <5) n_I , and therefore, the probability that there 
is at least one skilled individual among the n given that there are i social learners, is 
1 — (1 — p)’(T — S) n ~ l . Then to calculate n, take the expectation over all values of i. 

Thus, social learners will have higher fitness in a particular generation if 

W s - Wi = <1 -d) + q-C s )-K>0 (A1.4) 

We consider two special cases. Case 1: q^O, nra 1 — (1 — §) n . When social 
learners are rare, they will observe only individual learners, and thus the probability 
of observing at least one skilled individual does not depend on q or p. Thus, social 
learning will increase when rare as in this expression 

(1 - (1 - <5) n Xl - ~$) + Q t -Cs)-K> 0 (A1.5) 

Immediately after an environmental change, the frequency of skilled individuals among 
social learners is <5 and then increases monotonically until the next environmental 
change. Thus, the expected value of n is greater than (1 — (1 — ()}"), and if social 
learning can increase when rare, it will continue to increase until it reaches fixation. 

Case 2: n=l, n= 1 — q{\ — p) — (1 — q)(l — <5). Assume that selection is suffi- 
ciently weak so that the effect of selection on cultural evolution can be ignored (i.e., 
on dynamics of p), and genetic evolution (the dynamics of q) responds to the sta- 
tionary distribution of p. 

Then the frequency of the currently favored behavior after learning and imita- 


P’ = 


5 if environment changes 

(qp + (1 — q)(5)[l — <5) + d if environment does not change 


(A1.6) 


Suppose at some time t the probability density for p is f t [p) with mean P t . Then the 
mean of /, + i [p) is given by: 

Pt40, J [(1 - 7 mp + (1 - -S) + 8) + yd]f t (p)dp 


(A1.7) 
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Integrating yields the following recursion for P t : 


p, + 1 = 7<5 + (1 - 7 MqPt + (1 - -5) + S] (A1.8) 

Thus, the equilibrium value of mean frequency of the favored behavior is: 


a + Ci-yXi-qWi-fl 
1 - Ci - 7 X 1 - S)4 


CA1.9) 


Assume that selection is weak enough that the dynamics of q respond to the sta- 
tionary distribution of p. Then, since the expression for W s is linear in p when n = 1, 
we can substitute P for p. 


8 

1 - (1 - 7)0 - 8)4 


(A1.10) 


Notice that n> 8, which implies that social learners are more likely on average to 
acquire the skill. Substituting A1 . 1 0 into A1 .4 yields the following condition for social 
learning to increase in frequency: 


[1-7)(P(1— 8) + C t - C S )8 

i-(i-7Xi-<5k 


CA1.11) 


APPENDIX 2: Analysis of Continuous Character Model 

Since we are free to determine the scale of measurement of trait values, we can, 
without loss of generality, set ® = 0. Then the mean value of x in the population 
during generation t, X, is: 

X, = {\ -a)iX t -i + aO, (A2.1) 

The logarithm of the fitness of an individual with adult trait value x is proportional to: 
In {W) oc — (% - 0,f - C(i) (A2 .2) 

Thus, the expected fitness of an individual whose behavioral acquisition is governed 
by the parameter i is: 

E { In (W)) « -(1 - a] 2 E{[iX t , - 0 t f] - C(i] (A2.3) 

Consider the competition between two genotypes. The common type has de- 
velopment characterized by parameter i and the rare type by i + 8, where 8 is very 
small. If one assumes that changes in i have no effect on the variance of the trait 
among the invading type individuals, the expected fitness of the invading type is 
approximately proportional to: 

E{\n(W)} <x -(1 - a) 2 [(i 2 + 2iS)E{X 2 ,} - 2(i + 8)E\X t ,0 t } + ()]] 

- C(0 - 8 CA2.4) 

Combining expression A2.3 and A2.4 shows that the invading type will increase in 
frequency if: 

-(1 - a) z [ 2id£{X t 2 _ 1 } - 2c)£{X, ,0,}] - °^8 > 0 (A2.5) 

To calculate E[X t _i9 t } first notice the following: 

n _ j Oi-t with probability 1 — 7 la? 

f — \ e with probability 7 l • J 
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where e is an independent normal random variable with mean zero and variance H. 
Thus, it follows that: 

mx*- 1} * (1 - y)E{0, 1^1} + ?E{X t ] e} (A2.7) 

Multiplying both sides of A2.1 by 6, and taking the expectation with respect to the 
joint stationary distributions yield: 


E{6 t X t ] = (1 — a)iE[6 t X t -\} + aH 

Combining A2.7 and A2.8 yields the following expression for E[X t _i9 t }: 
pty nx _ 0 - y)aH 

{ t " I t} l - i(l - y)(l - a) 

To calculate E{Xf_ l ] square both sides of A2.1, take the expectation, and using A2.9 


(A2.8) 


(A2.9) 


E{K ,} = 


a 2 - 2t(l - a)E[X t { 0 t } 

1 f 2 Cl a ) 2 


(A2.10) 


Substituting A2.9 and A2.10 into A2.5 and simphfying yield expressions (4) and (5) 
in the text. 
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Climate, Culture, and the 
Evolution of Cognition 


What are the causes of the evolution of complex cognition? 
Discussions of the evolution of cognition sometimes seem to assume that 
more complex cognition is a fundamental advance over less complex cogni- 
tion, as evidenced by a broad trend toward larger brains in evolutionary his- 
tory. Evolutionary biologists are suspicious of such explanations because they 
picture natural selection as a process leading to adaptation to local environ- 
ments, not to progressive trends. Cognitive adaptations will have costs, and 
more complex cognition will evolve only when its local utility outweighs 

In this chapter, we argue that Cenozoic trends in cognitive complexity 
represent adaptations to an increasingly variable environment. The main support 
for this hypothesis is a correlation between environmental deterioration and 
brain size increase in many mammalian lineages. 

We would also like to understand the sorts of cognitive mechanisms that 
were favored in building more complex cognitions. The problem is difficult be- 
cause little data exist on the adaptive trade-offs and synergies between different 
cognitive strategies for adapting to variable environments. Animals might use 
information-rich, innate decision-making abilities, individual learning, social 
learning, and, at least in humans, complex culture, alone or in various combi- 
nations, to create sophisticated cognitive systems. 

We begin with a discussion of the correlated trends in environmental 
deterioration and brain size evolution and then turn to the problem of what 
sorts of cognitive strategies might have served as the impetus for brain 
enlargement. 
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Plio-Pleistocene Climate Deterioration 

The deterioration of climates during the last few million years should have 
dramatically increased selection for traits increasing animals’ abilities to cope with 
more variable environments. These traits include more complex cognition. Using 
a variety of indirect measures of past temperature, rainfall, ice volume, and the 
like, mostly from cores of ocean sediments, lake sediments, and ice caps, paleo- 
climatologists have constructed a stunning picture of climate deterioration over 
the last 14 million years (Lamb, 1977; Schneider and Londer, 1984; Dawson, 
1992; Partridge et al., 1995]. The Earth’s mean temperature has dropped several 
degrees and the amplitudes of fluctuations in rainfall and temperature have in- 
creased. For reasons as yet ill understood, glaciers wax and wane in concert with 
changes in ocean circulation, carbon dioxide, methane, and dust content of the 
atmosphere and changes in average precipitation and the distribution of precip- 
itation. The resulting pattern of fluctuation in climate is very complex. As the 
deterioration has proceeded, different cyclical patterns of glacial advance and 
retreat involving all these variables have dominated the pattern. A 21,700-year 
cycle dominated the early part of the period, a 4 1 , 000-year cycle between about 3 
and 1 million years ago, and a 95,800-year cycle the last million years. 

This cyclic variation is very slow with respect to the generation time of 
animals and is not likely to have directly driven the evolution of adaptations 
for phenotypic flexibility. However, increased variance on the time scales of the 
major glacial advances and retreats also seems to be correlated with great variance 
at much shorter time scales. For the last 120,000 years, quite high-resolution data 
are available from ice cores taken from the deep ice sheets of Greenland and 
Antarctica. Resolution of events lasting only a little more than a decade is possi- 
ble in ice 90,000 years old, improving to monthly after 3,000 years ago. During 
the last glacial period, ice core data show that the climate was highly variable on 
time scales of centuries to millennia (GRIP, 1993; Lehman, 1993; Ditlevsen, 
Svensmark, and Johnson, 1996]. Even when the climate was in the grip of the ice, 
there were brief spikelike ameliorations of about a thousand years duration in 
which the climate temporarily reached near interglacial warmth. The intense 
variability of the last glacial period carries right down to the limits of the nearly 
10-year resolution of the ice core data. Sharp excursions lasting a century or less 
occur in estimated temperatures, atmospheric dust, and greenhouse gases. Com- 
parison of the rapid variation during this period with older climates is not yet 
possible. However, an internal comparison is possible. The Holocene (the last 
relatively warm, ice-free 10,000 years] has been a period of very stable climate, at 
least by the standards of the last glacial epoch. At the decadal scale, the last glacial 
climates were much more variable than climates in the Holocene. Holocene 
weather extremes have had quite significant effects on organisms (Lamb, 1977], 
It is hard to imagine the impact of the much greater variation that was probably 
characteristic of most if not all of the Pleistocene epoch. Floods, droughts, 
windstorms, and the like, which we experience once a century, might have oc- 
curred once a decade. Tropical organisms did not escape the impact of climate 
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variation; temperature and especially rainfall were highly variable at low latitudes 
(Broecker, 1996]. During most periods in the Pleistocene, plants and animals 
must generally have lived under conditions of rapid, chaotic, and ongoing re- 
organization of ecological communities as species’ ranges adjusted to the noisy 
variation in climate. Thus, since the late Miocene epoch, organisms have had to 
cope with increasing variability in many environmental parameters at time scales 
on which strategies for phenotypic flexibility would be highly adaptive. 


Brain Size Evolution in the Neogene 

Mammals show clear signs of responding to climate deterioration by developing 
more complex cognition. Jerison’s (1973] classic study of the evolution of brain 
size documents major trends toward increasing brain size in many mammalian 
lineages that persist up through the Pleistocene. The time trends are complex. 
There is a progressive increase in average encephalization (brain size relative to 
body size] throughout the Cenozoic era. However, many relatively small-brained 
mammals persist even in orders where some species have evolved large brains. 
The diversity of brain size increases toward the present. Mammals continue to 
evolve under strong selective pressure to minimize brain size (see section on cog- 
nitive economics], and those that can effectively cope with climatic deteriora- 
tion by range changes or noncognitive adaptations do so. Other lineages evolve 
the means to exploit the temporal and spatial variability of the environment by 
using behavioral flexibility. The latter, we suppose, pay for the cost of enceph- 
alization by exploiting the ephemeral niches that less flexible, smaller brained 
species leave underexploited. 

Humans anchor the tail of the distribution of brain sizes in mammals; we are 
the largest brained member of the largest brained mammalian order. This fact 
supports a Darwinian hypothesis. Large gaps between species are hard to account 
for by the processes of organic evolution. That we are part of a larger trend suggests 
that a general selective process such as we propose really is operating. Nevertheless, 
there is some evidence that human culture is more than just a more sophisticated 
form of typical animal cognitive strategies. More on this vexing issue follows. 

The largest increase in encephalization per unit time by far is the shift from 
Miocene and Pliocene species to modern ones, coinciding with the Pleistocene 
climate deterioration. In the last 2.5 million years, encephalization increases were 
somewhat larger than during the steps from Archaic to Paleogene and Paleo- 
gene to Neogene, each of which represents tens of millions of years of evolution. 


General Purpose versus Special Purpose Mechanisms 

To understand how evolution might have shaped cognitive adaptations to vari- 
able environments, we need to know something about the elementary properties 
of mental machinery. Psychologists interested in the evolution of cognition have 
generated two classes of hypotheses about the nature of minds. A long-standing 
idea is that cognitively sophisticated mammals and birds have evolved powerful 
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and relatively general-purpose mental strategies that culminate in human intel- 
ligence and culture. These flexible general-purpose strategies replace more rigidly 
innate ones as cognitive sophistication increases. For example, Donald Campbell 
[1965, 1975] emphasizes the general similarities of all knowledge-acquiring 
processes ranging from organic evolution to modern science. He argues that even 
a quite fallible cognitive apparatus could nevertheless obtain workable mental 
representations of a complex variable environment by trial and error methods, 
much as natural selection shapes random mutations into organic adaptations. 
Bitterman’s [2000] empirical argument that simple and complex cognitions use 
rather similar learning strategies is a kindred proposal. Jerison [1973] argues that 
the main region of enlargement of bird and mammal brains in the Cenozoic era 
has been the forebrain, whose structures serve rather general coordinating 
functions. He believes that it is possible to speak of intelligence abstracted from 
the particular cognition of each species, which he characterizes as the ability to 
construct perceptual maps of the world and use them to guide behavior adap- 
tively. Edelman's [1987] theory of neuronal group selection is based on the 
argument that developmental processes cannot specify the fine details of the 
development of complex brains and hence that a lot of environmental feedback 
is necessary just to form the basic categories that complex cognition needs to 
work. This argument is consistent with the observation that animals with more 
complex cognition require longer juvenile periods with lots of “play” to provide 
the somatic selection of the fine details of synaptic structure. In Edelman's 
argument, a large measure of phenotypic flexibility comes as a result of the 
developmental constraints on the organization of complex brains by innate pro- 
gramming. If cognition is to be complex, it must be built using structures that are 
underdetermined at birth. 

Against general-purpose hypotheses, there has long been the suspicion that 
animal intelligence can be understood only in relationship to the habitat in which 
the species lives [Hinde, 1970:659-663]. Natural selection is a mechanism for 
adapting the individuals of a species to particular environmental challenges. It 
will favor brains and behaviors specialized for the niche of the species. There is 
no reason to think that it will favor some general capacity that we can oper- 
ationalize as intelligence across species. A recent school of evolutionary psy- 
chologists has applied this logic to the human case [Barkow, Cosmides, and 
Tooby, 1992; Pinker, 1997; Shettleworth, 2000]. The brain, they argue, even the 
human brain, is not a general problem-solving device but a collection of modules 
directed at solving the particular challenges posed by the environments in which 
the human species evolved. General problem-solving devices are hopelessly 
clumsy. To work at all, a mental problem-solving device must make a number of 
assumptions about the structure of its world, assumptions that are likely to hold 
only locally. Jack of all trades, master of none. Human brains, for example, are 
adapted to life in small-scale hunting and gathering societies of the Pleistocene. 
They will guide behavior within such societies with considerable precision but 
behave unpredictably in other situations. These authors are quite suspicious of 
the idea that culture alone forms the basis for human behavioral flexibility. As 
Tooby and Cosmides [1992] put it, what some take to be cultural traditions 
transmitted to relatively passive imitators in each new generation could actually 
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be partly, or even mainly, “evoked culture,” innate information that leads to 
similar behavior in parents and offspring simply because they live in similar en- 
vironments. In this model, human cognition is complex because we have many 
content-rich, special-purpose, innate algorithms, however much we also depend 
upon transmitted culture. 

This debate should not be trivialized by erecting straw protagonists. On the 
one hand, it is not sensible for defenders of cognitive generalism to ignore that 
the brain is a complex organ with many specialized parts, without which no 
mental computations would be possible. No doubt, much of any animal's mental 
apparatus is keyed to solve niche-specific problems, as is abundantly clear from 
brain comparative anatomy (Krubitzer, 1995) and from performance on learning 
tasks (Garcia and Koelling, 1966; Poli, 1986). Learning devices can be only 
relatively general; all of them must depend upon an array of innate processing 
devices to interpret raw sense data and evaluate whether they should be treated 
as significant (an actual or potential reinforcer). The more general a learning rule 
is, the weaker it is liable to be. 

On the other hand, one function of all brains is to deal with the unfore- 
seeable. The dimensionality of the environment is very large even for narrow spe- 
cialists, and even larger for weedy, succeeds-everywhere species like humans. 
Being preprogrammed to respond adaptively to a large variety of environmental 
contingencies may be costly or impossible. If efficient learning heuristics exist 
that obviate the need for large amounts of innate information, they will be 
favored by selection. 

When the situation is sufficiently novel, like most of the situations that rats 
and pigeons face in Skinner Boxes, every species is forced to rely upon what is, 
in effect, a very general learning capability. An extreme version of the special- 
purpose modules hypothesis would predict that animals should behave com- 
pletely randomly in environments as novel as they usually face in the laboratory. 
The fact that adaptive behavior emerges at all in such circumstances is a clear 
disproof of such an extreme position. Likewise, humans cannot be too tightly 
specialized for living in small hunting and gathering societies under Pleistocene 
conditions. We are highly successful in the Holocene epoch using far different 
social and subsistence systems. 


A Role for Social Learning in Variable Environments 

Our own hypothesis is that culture plays a large role in the evolution of human 
cognitive complexity. The case for a role for social learning in other animals is 
weaker and more controversial, but well worth entertaining. Social learning and 
culture furnish a menu of heuristics for adapting to temporally and spatially 
variable environments. Learning devices will be favored only when environments 
are variable in time or space in difficult to predict ways. Social learning is a 
device for multiplying the power of individual learning. Systems of phenotypic 
adaptation have costs. In the case of learning, an individual will have to expend 
time and energy, incur some risks in trials that may be associated with costly 
errors, and support the neurological machinery necessary to learn. Social learning 



CLIMATE, CULTURE, AND THE EVOLUTION OF COGNITION 71 

can economize on the trial and error part of learning. If kids learn from mom, 
they can avoid repeating her mistakes. “Copy mom” is a simple heuristic that 
may save one a lot of effort and be almost as effective as learning for oneself, 
provided the environment in one’s own generation is pretty much like mom’s. 
Suppose the ability to somehow copy mom is combined with a simple check of 
the current environment that warns one if the environment has changed sig- 
nificantly. If it has, one learns for oneself. This strategy allows social learners to 
avoid some learning costs but rely on learning when necessary. 

We have constructed a series of mathematical models designed to test the 
cogency of these ideas (Boyd and Richerson, 1985, 1989, 1995, 1996; see also 
Cavalli-Sforza and Feldman, 1973; Pulliam and Dunford, 1980). The formal 
theory supports the story. When information is costly to obtain and when there 
is some statistical resemblance between models’ and learners’ environments, 
social learning is potentially adaptive. Selection will favor individual learners 
who add social learning to their repertoire so long as copying is fairly accurate 
and the extra overhead cost of the capacity to copy is not too high. In some 
circumstances, the models suggest that social learning will be quite important 
relative to individual learning. It can be a great advantage compared to a system 
that relies on genes only to transmit information and individual learning to adapt 
to the variation. Selection will also favor heuristics that bias social learning in 
adaptive directions. When the behavior of models is variable, individuals who try 
to choose the best model by using simple heuristics like “copy dominants” or 
“go with the majority,” or by using complex cognitive analyses, are more likely 
to do well than those who blindly copy. Contrarily, if it is easy for individuals to 
learn the right thing to do by themselves, or if environments vary little, then 
social learning is of no utility. 

A basic advantage common to many of the model systems that we have 
studied is that a system linking an ability to make adaptive decisions to an abil- 
ity to copy speeds up the evolutionary process. Both natural selection and the 
biasing decisions that individuals make act on socially learned variation. The faster 
rate of evolution tracks a variable environment more faithfully, providing a fit- 
ness return to social learning. 

Our models of cultural evolution are much like the learning model Bitter- 
man describes (2000). In fact, one of our most basic models adds social learning 
to a model of individual learning virtually identical to his in order to investigate 
the inheritance-of-acquired-variation feature of social learning. Such models are 
simple and meant to be quite general. We expect that they will apply, at least 
approximately, to most examples of social learning in nature. 

Social learning strategies could represent a component of general-purpose 
learning systems. Social learning is potentially an adaptive supplement to a weak, 
relatively general-purpose learning rule. (We accept the argument that the more 
general a learning rule is, the weaker it has to be.) However, we have modeled 
several different kinds of rules for social learning. These would qualify as dif- 
ferent modules in Shettleworth’s terms (2000). The same rule, with different 
inputs and different parameter settings, can be implemented as a component of 
many narrowly specialized modules. Psychological evidence suggests that human 
culture involves numerous subsystems and variants that use a variety of patterns 
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of transmission and a variety of biasing heuristics (Boyd and Richerson, 1985}. 
Although all nonhuman social learning systems are, as far as we know, much 
simpler than human culture, they probably obey a similar evolutionary logic and 
vary adaptively from species to species (Chou and Richerson, 1992; Laland, 
Richerson, and Boyd, 1996], 

In no system of social learning have fitness effects yet been estimated; the 
adaptivness of simple social learning warrants skepticism. Rogers (1989, see also 
Boyd and Richerson, 1995} constructed a plausible model in which two geno- 
types were possible: individual learners and social learners. In his model, the social 
learning genotype can invade because social learners save on the cost of learning 
for themselves. However, at the equilibrium frequency of social learners, the 
fitness of the two types is equal. Social learners are parasites on the learn- 
ing efforts of individual learners. Social learning raises the average fitness of 
individuals only if individual learners also benefit from social learning. The well- 
studied system of social learning of food preference in rats is plausibly an ex- 
ample of adaptive social learning (Galef, 1996}, but the parasitic hypothesis 
is not yet ruled out. Lefebvre’s (2000} data indicating a positive correlation 
of individual and social learning suggest an adaptive combination of social and 
individual learning, although his data on scrounging in aviaries show that pigeons 
are perfectly willing to parasitize the efforts of others. We will be surprised if no 
cases of social learning corresponding to Rogers’s model ever turn up. 

The complex cognition of humans is one of the great scientific puzzles. Our 
conquest of the ultimate cognitive niche seems to explain our extraordinary 
success as a species (Tooby and Devore, 1987}. Why then has the human cog- 
nitive niche remained empty for all but a tiny slice of the history of life on earth, 
finally to be filled by a single lineage? Human culture, but not the social learning 
of most other animals, involves the use of imitation, teaching, and language to 
transmit complex adaptations subject to progressive improvement. In the human 
system, socially learned constructs can be far more sophisticated than even the 
most inspired individual could possibly hope to invent. Is complex culture the 
essence of our complex cognition or merely a subsidiary part? 


The Problem of Cognitive Economics 

To understand how selection for complex cognition proceeds, we need to know 
the costs, benefits, trade-offs, and synergies involved in using elementary cog- 
nitive strategies in compound architectures to adapt efficiently to variable envi- 
ronments. In our models we have merely assumed costs, accuracies, and other 
psychological properties of learning and social learning. We here sketch the kinds 
of knowledge necessary to incorporate cognitive principles directly into evolu- 
tionary models. 

Learning and decision making require larger sensory and nervous systems in 
proportion to their sophistication, and large nervous systems are costly (Eisen- 
berg, 1981:235-236}. Martin (1981} reports that mammalian brains vary over 
about a 25-fold range, controlling for body size. Aiello and Wheeler (1995} re- 
port that human brains account for 16 percent of our basal metabolism. Average 
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mammals have to allocate only about 3 percent of basal metabolism to their 
brains, and many marsupials get by with less than 1 percent. These differences 
are large enough to generate significant evolutionary trade-offs. In addition to 
metabolic requirements, there are other significant costs of big brains, such as 
increased difficulty at birth, greater vulnerability to head trauma, increased po- 
tential for developmental snafus, and the time and trouble necessary to fill these 
large brains with usable information. On the cost side, selection will favor as 
small a nervous system as possible. 

If our hypothesis is correct, animals with complex cognition foot the cost of a 
large brain by adapting more swiftly and accurately to variable environments. 
Exactly how do they do it? Given just three generic forms of adaptation to variable 
environments — innate information, individual learning, and social learning — and 
two kinds of mental devices — more general-purpose and less general-purpose — 
the possible architectures for minds are quite numerous. What sorts of trade-offs 
will govern the nature of structures that selection might favor? What is the 
overhead cost of having a large repertoire of innate special-purpose rules? Innate 
rules will consume genes and brain tissue with algorithms that may be rarely 
called upon. The gene-to-mind translation during development may be difficult 
for complex innate rules. If so, acquiring information from the environment using 
learning or social learning may be favored. Are there situations where a (relative) 
jack-of- all-trades learning rule can outcompete a bevy of specialized rules? What 
is the penalty paid in efficiency for a measure of generality in learning? Are there 
efficient heuristics that minds can use to gain a measure of generality without 
paying the full cost of a general-purpose learning device? Relatively general- 
purpose heuristics might work well enough over a wide enough range of envi- 
ronmental variation to be almost as good as several sophisticated special-purpose 
algorithms, each costing as much brain tissue as the general heuristic (see 
Gigerenzer and Goldstein, 1996, on simple but powerful heuristics). 

Hypothesis building here is complicated because we cannot assume that 
individual learning, social learning, and innate knowledge are simply competing 
processes. For example, more powerful or more general learning algorithms may 
generally require more innate information (Tooby and Cosmides, 1992). More 
sophisticated associative learning will typically require more sense data to make 
finer discriminations of stimuli. Sophisticated sense systems depend upon 
powerful, specialized, innate algorithms to make useful information from a mass 
of raw data from the sensory transducers (Spelke, 1990; Shettleworth, 2000). 
Hypothesis building is also complicated because we have no rules describing the 
efficiency of a compound system of some more and some less specialized mod- 
ules. For example, a central general-purpose associative learning device might 
be the most efficient processor for such sophisticated sensory data because re- 
dundant implementation of the same learning algorithm in many modules 
might be costly. Intense modularity in parts of the mind may favor general- 
purpose, shared, central devices in other parts. Bitterman’s (2000) data are 
consistent with a central associative learning processor that is similar by ho- 
mology across most of the animal kingdom. However, his data are also consis- 
tent with several or many encapsulated special-purpose associative learning 
devices that have converged on a relatively few efficient association algorithms. 
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Shettleworth’s [2000) argument for modularity by analogy with perception has 
appeal. If the cost of implementing an association algorithm is small relative to 
the cost of sending sensory data large distances across the brain, selection will 
favor association algorithms in many modules. However, the modularity of 
perception is surely driven in part by the fact that the different sense organs must 
transduce very different physical data. Bitterman’s (2000) data show that, once 
reduced to a more abstract form, many kinds of sense data can be operated on by 
the same learning algorithm, which might be implemented centrally or modu- 
larly. The same sorts of issues will govern the incorporation of social learning 
into an evolving cognitive system. 

There may be evolutionary complications to consider. For example, seldom- 
used special-purpose rules (or the extreme seldom-used ranges of frequently 
exercised rules) will be subject to very weak selection. More general-purpose 
structures have the advantage that they will be used frequently and hence be well 
adapted to the prevailing range of environmental uncertainty. If they work to any 
approximation outside this range, selection can readily act to improve them. 
Narrowly special-purpose algorithms could have the disadvantage that they can 
be “caught out” by a sudden environmental change, exhibiting no even mar- 
ginally useful variation for selection to seize upon, whereas more general-purpose 
individual and social learning strategies can expose variation to selection in such 
cases (Laland et al., 1996). On the other hand, we might imagine that there is a 
reservoir of variation in outmoded special-purpose algorithms, on which selection 
has lost its purchase, that furnishes the necessary variation in suddenly changed 
circumstances. 

The high dimensionality of the variation of Pleistocene environments puts a 
sharp point on the innate information versus learning/social learning modes of 
phenotypic flexibility. Mightn't the need for enough information to cope with 
such complex change by largely innate means exhaust the capacity of the genome 
to store and express it? Recall Edelman’s (1987) neuronal group selection hy- 
pothesis in this context. Immelman (1975) suggests that animals use imprinting 
to identify their parents and acquire a concept of their species because it is not 
feasible to store a picture of the species in the genes or to move the information 
from genes to the brain during development. It may be more economical to use 
the visual system to acquire the picture after birth or hatching by using the simple 
heuristic that the first living thing one sees is mom and a member of one’s own 
species. In a highly uncertain world, wouldn’t selection favor a repertoire of 
heuristics designed to learn as rapidly and efficiently as possible? 

As far as we understand, psychologists are not yet in a position to give us the 
engineering principles of mind design the way that students of biological me- 
chanics now can for muscle and bone. If these principles turn out to favor com- 
plex, mixed designs with synergistic, nonlinear relationships between parts, the 
mind design problem will be quite formidable. We want to avoid asking silly 
questions analogous to “which is more important to the function of a modern 
PC, the hardware or the software?” However, in our present state of ignorance, 
we do run the risk of asking just such questions! 

With due care, perhaps we can make a little progress. In this chapter, we 
use a method frequently used by evolutionary biologists, dubbed “strategic 
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modeling” by Tooby and Devore (1987). In strategic modeling, we begin with 
the tasks that the environment sets for an organism and attempt to deduce how 
natural selection should have shaped the species’ adaptation to its niche. Often, 
evolutionary biologists frame hypotheses in terms of mathematical models of 
alternative adaptations that predict, for instance, what foraging or mate choice 
strategy organisms with a given general biology should pursue in a particular 
environment. This is just the sort of modeling we have undertaken in our studies 
of social learning and culture. We ask: how should organisms cope with different 
kinds of spatially and temporally variable environments? 


Social Learning versus Individual Learning versus 
Innate Programming? 

Increases in brain size could signal adaptation to variable environments via in- 
dividual learning, social learning, or more sophisticated innate programming. 
Our mathematical models suggest that the three systems work together. Most 
likely increases in brain size to support more sophisticated learning or social 
learning will also require at least some more innate programming. There is likely 
an optimal balance of innate and acquired information dictated by the structure 
of environmental variability. Given the tight cost/benefit constraints imposed on 
brains, at the margin we would expect to find a trade-off between social learning, 
individual learning, and innate programming. For example, those species that 
exploit the most variable niches should emphasize individual learning, whereas 
those that live in more highly autocorrelated environments should devote more 
of their nervous systems to social learning. 

Lefebvre (2000) reviews studies designed to test the hypothesis that social 
and opportunistic species should be able to learn socially more easily than the 
more conservative species, and the conservative species should be better indi- 
vidual learners. Surprisingly, the prediction fails. Species that are good social 
learners are also good individual learners. One explanation for these results is that 
the synergy between these systems is strong. Perhaps the information-evaluating 
neural circuits used in social and individual learning are partly or largely shared. 
Once animals become social, the potential for social learning arises. The two 
learning systems may share the overhead of maintaining the memory storage 
system and much of the machinery for evaluating the results of experience. If so, 
the benefits in quality or rate of information gained may be large relative to the 
cost of small bits of specialized nervous tissue devoted separately to each capacity. 
If members of the social group tend to be kin, investments in individual learning 
may also be favored because sharing the results by social learning will increase 
inclusive fitness. On the other hand, Lefebvre notes that not all learning abilities 
are positively correlated. Further, the correlation may be due to some quite 
simple factor, such as low neophobia, not a more cognitively sophisticated ad- 
aptation. 

The hypothesis that the brain tissue trade-off between social and individual 
learning is small resonates with what we know of the mechanisms of social 
learning in most species. Galef (1988, 1996), Lalandetal. (1996), andHeyesand 
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Dawson (1990] argue that the most common forms of social learning result from 
very simple mechanisms that piggyback on individual learning. In social species, 
naive animals follow more experienced parents, nestmates, or flock members as 
they traverse the environment. The experienced animals select highly nonrandom 
paths through the environment. They thus expose naive individuals to a highly 
selected set of stimuli that then lead to acquisition of behaviors by ordinary 
mechanisms of reinforcement. Social experience acts, essentially, to speed up and 
make less random the individual learning process, requiring little additional, 
specialized, mental capacity. Social learning, by making individual learning more 
accurate without requiring much new neural machinery, tips the selective bal- 
ance between the high cost of brain tissue and advantages of flexibility in favor of 
more flexibility. As the quality of information stored on a mental map increases, it 
makes sense to enlarge the scale of maps to take advantage of that fact. Eventu- 
ally, diminishing returns to map accuracy will limit brain size. 

Once again, we must take a skeptical view of this adaptive hypothesis until 
experimental and field investigations produce better data on the adaptive con- 
sequences of social learning. Aside from Rogers’s parasitic scenario, the sim- 
plicity of social learning in most species and its close relationship to individual 
learning invite the hypothesis that most social learning is a by-product of indi- 
vidual learning that is not sufficiently important to be shaped by natural selec- 
tion. Human imitation, by contrast, is so complex as to suggest that it must have 
arisen under the influence of selection. 

Eisenberg’s (1981, ch. 23] review of a large set of data on the encephalization 
of living mammals suggests that high encephalization is associated with extended 
association with parents, late sexual maturity, extreme iteroparity, and long po- 
tential life span. These life cycle attributes all seem to favor social learning (but also 
any other form of time-consuming skill acquisition]. We would not expect this 
trend if individual and social learning were a small component of encephalization 
relative to innate, information-rich modules. Under the latter hypothesis animals 
with a minimal opportunity to take advantage of parental experience and parental 
protection while learning for themselves ought to be able to adapt to variable 
environments with a rich repertoire of innate algorithms. Eisenberg’s data suggest 
that large brains are not normally favored in the absence of social learning or social 
facilitation of individual learning. The study of any species that run counter to 
Eisenberg’s correlation might prove very rewarding. Large-brained species with a 
small period of juvenile dependence should have a complex cognition built dis- 
proportionately of innate information. Similarly, small-brained social species with 
prolonged juvenile dependence or other social contact may depend relatively 
heavily on simple learning and social learning strategies. Lefebvre and Palameta 
(1988] provide a long list of animals in which social learning has been more or less 
convincingly documented. Recently, Dugatkin (1996] and Laland and Williams 
(1997] have demonstrated social learning in guppies. Even marginally social spe- 
cies may come under selection for behaviors that enhance social learning, as in the 
well-known case of mother housecats who bring partially disabled prey to their 
kittens for practice of killing behavior (Caro and Hauser, 1992], 

Some examples of nonhuman social learning are clearly specialized, such as 
birdsong imitation, but the question is open for other examples. Aspects of the 
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social learning system in other cases do show signs of adaptive specialization, 
illustrating the idea that learning and social learning systems are only general 
purpose relative to a completely innate system. For example, Terkel (1996] and 
Chou (1989, personal communication] obtained evidence from laboratory stud- 
ies of black rats that the main mode of social learning is from mother to pups. 
This is quite unlike the situation in the case of Norway rats, where Galef (1988, 
1996] and coworkers have shown quite conclusively that mothers have no 
special influence on pups. In the black rat, socially learned behaviors seem to be 
fixed after a juvenile learning period, whereas Norway rats continually update 
their diet preferences (the best-studied trait] based upon individually acquired 
and social cues. Black rats seem to be adapted to a more slowly changing envi- 
ronment than Norway rats. Terkel studied a rat population that has adapted 
to open pinecones in an exotic pine plantation in Israel, a novel and short-lived 
niche by most standards, but one that will persist for many rat generations. 
Norway rats are the classic rats of garbage dumps, where the sorts of foods 
available change weekly. 


Human versus Other Animals’ Culture 

The human species’ position at the large-brained tail of the distribution of late 
Cenozoic encephalization suggests the hypothesis that our system of social 
learning is merely a hypertrophied version of a common mammalian system 
based substantially on the synergy between individual learning and simple sys- 
tems of social learning. However, two lines of evidence suggest that there is more 
to the story. 

First, human cultural traditions are often very complex. Subsistence sys- 
tems, artistic productions, languages, and the like are so complex that they must 
be built over many generations by the incremental, marginal modifications of 
many innovators (Basalla, 1988], We are utterly dependent on learning such 
complex traditions to function normally. 

Second, this difference between humans and other animals in the complexity 
of socially learned behaviors is mirrored in a major difference in mode of social 
learning. As we saw, the bulk of animal social learning seems to be dependent 
mostly on the same techniques used in individual learning, supplemented at the 
margin by a bit of teaching and imitation. Experimental psychologists have de- 
voted much effort to trying to settle the question of whether nonhuman animals 
can learn by “true imitation” or not (Galef, 1988]. True imitation is learning a 
behavior by seeing it done. True imitation is presumably more complex cogni- 
tively than merely using conspecifics’ behavior as a source of cues to stimuli that 
it might be interesting to experience. Although there are some rather good ex- 
periments indicating some capacity for true imitation in several socially learning 
species (Heyes, 1996; Moore, 1996; Zentall, 1996], head-to-head comparisons of 
children’s and chimpanzee’s abilities to imitate show that children begin to ex- 
ceed chimpanzees’ capabilities at about three years of age (Whiten and Custance, 
1996; Tomasello, 1996, 2000]. The lesson to date from comparative studies of 
social learning suggests that simple mechanisms of social learning are much more 
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common and more important than imitation, even in our close relatives and other 
highly encephalized species. 


Why Is Complex Culture Rare? 

One hypothesis is that an intrinsic evolutionary impediment exists, hampering 
the evolution of a capacity for complex traditions. We show elsewhere that, 
under some sensible cognitive-economic assumptions, a capacity for complex cu- 
mulative culture cannot be favored by selection when rare (Boyd and Richerson, 
1996]. The mathematical result is quite intuitive. Suppose that to acquire a 
complex tradition efficiently, imitation is required. Suppose that efficient imi- 
tation requires considerable costly, or complex, cognitive machinery, such as 
a theory of mind/imitation module (Cheney and Seyfarth, 1990:277-230; 
Tomasello, 2000). If so, there will be a coevolutionary failure of capacity for 
complex traditions to evolve. The capacity would be a great fitness advantage, 
but only if there are cultural traditions to take advantage of. But, obviously, there 
cannot be complex traditions without the cognitive machinery necessary to 
support them. A rare individual who has a mutation coding for an enlarged 
capacity to imitate will find no complex traditions to learn and will be handi- 
capped by an investment in nervous tissue that cannot function. The hypothesis 
depends upon a certain lumpiness in the evolution of the mind. If even a small 
amount of imitation requires an expensive or complex bit of mental machinery, 
or if the initial step in the evolution of complex traits does not result in par- 
ticularly useful traditions, then there will be no smooth evolutionary path from 
simple social learning to complex culture. 

If such an impediment to the evolution of complex traditions existed, 
evolution must have traveled a roundabout path to get the frequency of the 
imitation capacity high enough to begin to bring it under positive selection for its 
tradition-supporting function. Some suggest that primate intelligence was orig- 
inally an adaptation to manage a complex social life (Humphrey, 1976; Byrne 
and Whiten, 1988; Rummer, Daston, Gigerenzer, and Silk, 1997; Dunbar, 1992, 
2000). Perhaps in our lineage the complexities of managing the sexual division of 
labor, or some similar social problem, favored the evolution of the capacity to 
develop a sophisticated theory of mind. Such a capacity might incidentally make 
efficient imitation possible, launching the evolution of elementary complex 
traditions. Once elementary complex traditions exist, the threshold is crossed. 
As the evolving traditions become too complex to imitate easily, they will begin 
to drive the evolution of still more sophisticated imitation. This sort of stickiness 
in evolutionary processes is presumably what gives evolution its commonly 
contingent, historical character (Boyd and Richerson, 1992). 


Conclusion 


The evolution of complex cognition is a complex problem. It is not entirely clear 
what selective regimes favor complex cognition. The geologically recent increase 
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in the encephalization of many mammalian lineages suggests that complex 
cognition is an adaptation to a common, widespread, complex feature of the en- 
vironment. The most obvious candidate for this selective factor is the deterio- 
ration of the earth’s climate since the late Miocene epoch, culminating in the 
exceedingly noisy Pleistocene glacial climates. 

In principle, complex cognition can accomplish a system of phenotypic 
flexibility by using information-rich innate rules or by using more open individual 
and social learning. Presumably, the three forms of phenotypic flexibility are 
partly competing, partly mutually supporting mechanisms that selection tunes 
to the patterns of environmental variation in particular species’ niches. Because 
of the cost of brain tissue, the tuning of cognitive capacities will take place in 
the face of a strong tendency to minimize brain size. However, using strategic 
modeling to infer the optimal structure for complex cognitive systems from 
evolutionary first principles is handicapped by the very scanty information on 
trade-offs and constraints that govern various cognitive information-processing 
strategies. For example, we do not understand how expensive it is to encode 
complex innate information-rich computational algorithms relative to coping 
with variable environments with relatively simple, but still relatively efficient, 
learning heuristics. Psychologists and neurobiologists might usefully concentrate 
on such questions. 

Human cognition raises the ante for strategic modeling because of its ap- 
parently unique complexity and yet great adaptive utility. We can get modest 
but real leverage on the problem by investigating other species with cognitive 
complexity approaching ours, which in addition to great apes may include other 
monkeys, some cetaceans, parrots, and corvids (Moore, 1996; Heinrich, 2000; 
Clayton, Griffiths, and Dickinson, 2000). Our interpretation of the evidence is 
that human cognition mainly evolved to acquire and manage cumulative cultural 
traditions. This capacity probably cannot be favored when rare, even in circum- 
stances where it would be quite successful if it did evolve. Thus, its evolution 
likely required, as a preadaptation, the advanced cognition achieved by many 
mammalian lineages in the last few million years. In addition, it required an 
adaptive breakthrough, such as the acquisition of a capacity for imitation as a by- 
product of the evolution of a theory of mind capacity for social purposes. 
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Norms and Bounded 
Rationality 


Do Norms Help People Make Good Decisions 
without Much Thought? 

Many anthropologists believe that people follow the social norms of their society 
without much thought. According to this view, human behavior is mainly the 
result of social norms and rarely the result of considered decisions. In recent 
years, there has been increased interest within anthropology in how individuals 
and groups struggle to modify and reinterpret norms to further their own in- 
terests. However, we think it is fair to say that most anthropologists still believe 
that culture plays a powerful role in shaping how people think and what they do. 

Many anthropologists also believe that social norms lead to adaptive be- 
havior; by following norms, people can behave sensibly without having to un- 
derstand why they do what they do. For example, throughout the New World, 
people who rely on corn as a staple food process the grain by soaking it in a strong 
base (such as calcium hydroxide] to produce foods like hominy and masa (Katz, 
Hediger, and Valleroy, 1974]. This alkali process is complicated, requires hard 
work, and substantially reduces the caloric content of corn. However, it also 
increases the amount of available lysine, the amino acid in which corn is most 
deficient. Katz et al. argue that alkali processing plays a crucial role in preventing 
protein deficiency disease in regions where the majority of calories are derived 
from corn. Traditional peoples had no understanding of the nutritional value of 
alkali processing; rather, it was a norm: we Maya eat masa because that is what we 
do. Nonetheless, by following the norm, traditional people were able to solve an 
important and difficult nutritional problem. The work of cultural ecologists, such 
as Marvin Harris (1979], provides many other examples of this kind, although 
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few are as well worked out. Other varieties of functionalism (for a discussion, 
see Turner and Maryanski, 1979] also hold that social norms evolve to adapt to 
the local environment. While nowadays anthropologists are explicitly critical of 
functionalism, cryptic functionalism still pervades much thinking in anthropol- 
ogy (Edgerton, 1992). 

Norms may also lead to sensible behavior by proscribing choices that people 
find tempting in the short run but are damaging in the long run. Moral systems 
around the world have proscriptions against drunkenness, laziness, gluttony, and 
other failures of self-control. There is evidence that such proscriptions can in- 
crease individual well-being. For example, Jensen and Ericson (1979) show that 
Mormon youths in Tucson are less likely to be involved in “victimless crimes,” 
such as drinking and marijuana use, than members of a nonreligious control group. 
Moreover, these differences seem to have consequences. McEvoy and Land (1981) 
report that age-adjusted mortalities for Mormons in Missouri are approximately 
20 percent lower than those for control populations, and the differences were 
biggest for lung cancer, pneumonia/influenza, and violent death, sources of mor- 
tality that should be reduced if the abstentious Mormon norms are being ob- 
served. Apparently, living in a group in which there are norms against alcohol 
use makes it easier for young Mormons to do what is in their own long-term 
interest. 


What Are Norms, and Why Do People Follow Them? 

Examples like these present a series of interesting questions to economists, psy- 
chologists, and others who start with individuals as the basic building blocks of 
social theory. First, what are norms? How can we incorporate the notion that 
there are shared social rules into models that assume that people are goal-oriented 
decision makers? Second, why should people follow norms? Norms will change 
behavior only if they prescribe behavior that differs from what people would do 
in the absence of norms. Finally, why should norms be sensible? If individuals 
cannot (or do not) determine what is sensible, why should norms prescribe 
sensible behavior? It seems more plausible that they will simply represent random 
noise or even superstitious nonsense. 

A recent efflorescence of interest in norms among rational choice theorists 
provides one cogent answer to the first two questions. Norms are the result of 
shared notions of appropriate behavior and the willingness of individuals to re- 
ward appropriate behavior and punish inappropriate behavior (for a review, see 
McAdams, 1997). Thus, it is a norm for men to remove their hats when they 
enter a Christian church because they will suffer the disapproval of others if they 
do not. In contrast, it is not a norm for men to remove their hats in an overheated 
country and western bar, even if everyone does so. By this notion, people obey 
norms because they are rewarded by others if they do and punished if they do not. 
As long as the rewards and punishments are sufficiently large, norms can stabilize 
a vast range of different behaviors. Norms can require property to be passed to the 
oldest son or to the youngest; they can specify that horsemeat is a delicacy or 
deem it unfit for human consumption. 
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There is no consensus in this literature about why people choose to punish 
norm violators and reward norm followers. There have been a number of proposals: 
Binmore (1998] argues that social life is an infinite game and that norms are game 
theoretic equilibria of the kind envisioned in the folk theorem. Norm violators are 
punished, and so are people who fail to punish norm violators, people who fail to 
punish them, and so on ad infinitum. McAdams (1997) suggests that all people 
desire the esteem of others, and because esteem can be ' 'produced' ’ at very low cost, 
it is easy to punish norm violators by withholding esteem. Bowles and Gintis (1999) 
and Richerson and Boyd (1998) argue that group selection acting over the long 
history of human evolution created a social environment in which natural selection 
favored genes leading to a reciprocal psychology. Here we will simply assume that 
the problem of why people choose to enforce norms has somehow been solved. 


How Do Norms Solve Problems That People Cannot 
Solve on Their Own? 

Virtually all of the recent literature on norms focuses on how norms help people 
solve public goods and coordination problems (e.g., Ostrom, 1991; Ellickson, 1994). 
It does not explain why norms should be adaptive. If people do not understand why 
alkali treatment of com is a good thing, why should they require their neighbors 
to eat masa and hominy and be offended if they do not? Nor does the recent 
literature on norms explain why norms should commonly help people with prob- 
lems of self-control. If people cannot resist the temptations of alcohol, why should 
they insist that their neighbors do so? We sketch possible answers to these questions. 

Occasional Learning plus Conformism Leads to Adaptive Norms 

In this section we show how a small amount of individual learning, when cou- 
pled with cultural transmission and a tendency to conform to the behavior of 
others, can lead to adaptive norms, even though most people simply do what 
everyone else is doing. 

Why It May Be Sensible for Most People to Imitate 

It is easy to see why people may choose to imitate others when it is costly or 
difficult to determine the best behavior — copying is easier than invention, and 
plagiarism is easier than creation. However, as these examples illustrate, it is not 
clear that by saving such costs, imitation makes everybody better off; this is why 
we have patents and rules against plagiarism. We have analyzed a series of 
mathematical models which indicate that when decisions are difficult, everyone 
can be better off if most people imitate the decisions of others under the right 
circumstances (Boyd and Richerson, 1985, 1988, 1989, 1995, 1996). The fol- 
lowing simple model illustrates our reasoning. 

Consider a population that lives in an environment that switches between two 
states with a constant probability. Further assume that there are two behaviors, 
one best in each environmental state. All individuals attempt to discover the best 
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behavior in the current environment. First, each individual experiments with both 
behaviors and then compares the results. The results of such experiments vary for 
many reasons, and so the behavior that is best during any particular trial may be 
inferior over the long run. To represent this idea mathematically, assume that the 
observed difference in payoffs is a normally distributed random variable, X (figure 
5.1). Second, each individual can observe the behavior of an individual from the 
previous generation who has already made the decision. 

We assume that individuals combine sources of information by adopting a 
particular behavior if its payoff appears sufficiently better than its alternative; 
otherwise, they imitate. The larger the observed difference in the payoffs be- 
tween the two behaviors, the more likely it is that the behavior with the higher 
payoff actually is best. By insisting on a large difference in observed payoff, 
individuals can reduce the chance that they will mistakenly adopt the inferior 
behavior. Of course, being discriminating will also cause more trials to be in- 
decisive, and, then, they must imitate. Thus, there is a trade-off. Individuals can 
increase the accuracy of learning but only by also increasing the probability that 
learning will be indecisive and having to rely on imitation. 


Learning Rule 


Choose Imitate Choose 

trait 2 j, i trait 1 



Figure 5 . 1 . A graphical representation of the model of individual and social learning. 

Each individual observes an independent, normally distributed environmental cue, X. A 
positive value of X indicates that the environment is in state 1; a negative value indicates 
that the environment is in state 2. If the value of X is larger than the threshold value, d, 
an individual adopts trait 1 . This occurs with probability, pi . If the value of the environ- 
mental cue is smaller than —d, the individual adopts trait 2 , which occurs with probability, 
p 2 - Otherwise, the individual imitates. Thus, the larger the standard deviation of the cue 
compared to its mean value, the greater is the predictive value of the cue. 
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The optimal decision rule depends on what the rest of the population is 
doing. Assume that most individuals use a learning rule that causes them to 
imitate x percent of the time — call these “common-type” individuals. There are 
also a few rare invaders who imitate slightly more often. Compared to the com- 
mon type, invaders are less likely to make learning errors. Thus, when invaders 
learn, they have a higher payoff than the common-type individuals when they 
learn. When invaders imitate, they have the same payoff as the common-type 
individuals. However, invaders must imitate more often and those who must 
imitate always have lower fitness than those whose personal information is 
above the learning threshold. To see why, think of each imitator as being con- 
nected to a learner by a chain of imitation. If the learner at the end of the chain 
learned in the current environment, then the imitator has the same chance of 
acquiring the favored behavior as does a learner. If the learner at the end of the 
chain learned in a different environment, the imitator will acquire the wrong 
trait. Thus, the invading type will achieve a higher payoff if the advantage of 
making fewer learning errors is sufficient to offset the disadvantage of imitating 
more. 

This trade-off depends on how much the common type imitates. When the 
common type rarely imitates, the payoff of individuals who imitate and in- 
dividuals who learn will be similar because most imitators will imitate somebody 
who learned, and the fact that mutants make fewer learning errors will allow 
them to invade. However, as the amount of imitation increases, the payoff of 
imitating individuals relative to those who learn declines because increased imi- 
tation lengthens the chain connecting each imitator to a learner. Eventually the 
population reaches an equilibrium at which the common type can resist invasion 
by mutants that change the rate of imitation. Figure 5.2 plots the probability that 
individuals imitate (denoted as L in figure 5.1) at evolutionary equilibrium as a 
function of the quality of the information available to individuals for three dif- 
ferent rates of environmental change (for details of the calculation, see Boyd and 
Richerson, 1988). Notice that when it is difficult for individuals to determine the 
best behavior and when environments change infrequently, more than 90 percent 
of a population at equilibrium simply copies the behavior of others. 

As long as environments are not completely unpredictable, the average payoff 
at the evolutionary equilibrium is greater than the average payoff of individuals 
who do not imitate (Boyd and Richerson, 1995). The reason is simple: imitation 
allows the population to learn when the information is good and imitate when it 
is bad. Figure 5.3 plots the average payoff of imitating and learning individuals as 
a function of the fraction of individuals who imitate. The payoff of learning in- 
dividuals increases as the amount of imitation increases because individuals are 
demanding better evidence before relying on their individual experience and 
therefore are making fewer learning errors. The payoff of individuals who imitate 
because their evidence does not happen to meet rising standards also increases at 
first because they are directly or indirectly imitating learners who make fewer 
errors. If imitation is too common, the payoff to imitation declines because the 
too-discriminating population does too little learning to track the changing envi- 
ronment. The first effect is sufficient to lead to a net increase in average payoff at 
evolutionary equilibrium. 
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Figure 5.2. Probability that individuals imitate (X) at evolutionary equilibrium as a 
function of the quality of the environmental cue for three different rates of environmental 
change. The mean of the environmental cue (X) is 1.0, so as the standard deviation of X 
increases, the extent to which the cue predicts the environmental state decreases. Thus, 
the results plotted here indicate that as the predictive quality of the cue decreases, the 
probability of imitation at evolutionary equilibrium increases. The parameter e is the 
probability that the environment remains unchanged from one time period to the next. 
Thus, as the rate of environmental change decreases, the probability of imitation at 
evolutionary equilibrium increases. See Boyd and Richerson (1988} for details. 


Figure 5.3. Individuals either 
learn or imitate according to 
the outcome of their learning 
trial. As individuals become 
more selective, the frequency 
of imitating individuals 
increases. This figure plots the 
expected fitness of individuals 
who imitate and those who 
learn as a function of the 
probability that an individual 
randomly chosen from the 
population imitates (assuming 
the outcome of learning 
experiments is normally 
distributed with mean 0.5 
and variance 1}. 
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We believe that the lessons of this model are robust. It formalizes three basic 
assumptions: 

1. The environment varies. 

2. Cues about the environment are imperfect, so individuals make 
errors. 

3. Imitation increases the accuracy (or reduces the cost] of learning. 

We have analyzed several models that incorporate these assumptions but differ in 
other features. All of these models lead to the same qualitative conclusion: when 
learning is difficult and environments do not change too fast, most individuals 
imitate at evolutionary equilibrium. At that equilibrium, an optimally imitating 
population is better off, on average, than a population that does not imitate. 


Adding Conformism 

So far we have shown only that it may be best for most people to copy others 
rather than try to figure things out for themselves. Recall that for something 
to be a norm, there has to be a conformist element. People must agree on the 
appropriate behavior and disapprove of others who do not behave appropriately. 
We now show that individuals who respond to such disapproval by conforming 
to the social norm are more likely to acquire the best behavior. We will also show 
that as the tendency to conform increases, so does the equilibrium amount of 
imitation. 

To allow the possibility for conformist pressure, we add the following as- 
sumption to the model described. When an individual imitates, she may be dis- 
proportionately likely to acquire the more common variant. Let q be the fraction 
of the population using trait 1 . As before, individuals collect information about 
the best behavior in the current environment, and then if the information is not 
decisive, they imitate. However, now the probability (Prob) that an imitating 
individual acquires trait 1 is: 

Prob(l) = q + Aq(l — q){2q — 1] (1) 

Thus, A represents the extent to which individuals respond to the blandishments 
of others. When A = 0, individuals ignore conformist pressures, and the model is 
the same as the one described in the previous section. When A > 0, social 
pressure (or merely a desire to be like others) induces individuals to adopt the 
more common of the two behaviors. When Awl, individuals almost always 
adopt the same behavior as the majority. 

We now determine the equilibrium values of A and L (the probability of 
relying on imitation) in the same way that we determined the equilibrium 
amount of imitation. Assume that most of the population is characterized by one 
pair of values of A and L. Then, consider whether that population can be invaded 
by individuals using slightly different values of A and L. The evolutionary equi- 
librium is the combination of values of A and L that cannot be invaded in this way. 

This analysis leads to two robust results. First, all conditions that lead a 
substantial fraction of the population to rely on imitation also lead to very strong 
conformity. Consider, for example, figure 5.4, which plots the equilibrium values 
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Figure 5.4. Equilibrium values of L and A for different rates of environmental variation. 
At evolutionary equilibrium, the strength of conformist transmission is high for a wide 
range of rates of environmental change. However, reliance on social learning (X as pro- 
portion ranging from 0 to 1.0) decreases rapidly over the same range of environmental 
stability. When there is no conformist effect (A is constrained to be zero), the evolutionary 
equilibrium value of L is lower than when A is free to evolve to its equilibrium value. 
Since the conformist effect causes the population to track the environment more effec- 
tively, it makes social learning more useful. For more details on this calculation, see 
Henrich and Boyd (1998). 


of A and L as a function of the rate of environmental change. Notice that as long 
as the environment is not completely unpredictable, the equilibrium value of A 
is near its maximum value — when people imitate, they virtually always do what 
the majority of the population is doing. As detailed in Henrich and Boyd (1998), 
the equilibrium values of A and L are equally insensitive to other parameters 
in the model. Second, as conformism increases, so does the fraction of the po- 
pulation that relies on imitation. Figure 5.4 shows that the equilibrium value of 
L, when both L and A are allowed to evolve, is larger than the equilibrium value 
of L in a model in which A is constrained to be zero. Thus, a tendency to con- 
form increases the number of people who follow social norms and decreases 
the numbers who think for themselves. 

These results are easy to understand. Just after the environment switches, 
most people acquire the wrong behavior. Then, the combination of occasional 
learning and imitation causes the best behavior to become gradually more 
common in the population until an equilibrium is reached at which most of the 
people are characterized by the better behavior. For rates of environmental 
change that favor substantial reliance on imitation, the best behavior is more 
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common than the alternative averaged over this entire cycle. Thus, individuals 
with a conformist tendency to adopt the most common behavior when in doubt 
are more likely to acquire the best behavior. Conformism continues to increase 
until it becomes so strong that it prevents the population from responding 
adaptively after an environmental shift. Optimal conformism leads to increased 
imitation because on average conformism causes imitators to be more likely to 
acquire the best behavior in the current environment. 


Imitation of Successful Neighbors Leads to the 
Spread of Beneficial Norms 

There is a large literature that indicates that people often have time-inconsistent 
preferences and, as a result, they often make choices in the short run that they 
know are not in their long-term interest. It is plausible that social norms help 
people solve these problems by creating short-term incentives to do the right 
thing. I may not be able to resist a drink when the costs are all in the distant 
future but will make a different decision if I suffer immediate social disapproval. 
It is also easy to see why such norms persist once they are established. If ev- 
eryone agrees that self-control is proper behavior and punish people who dis- 
agree, then the norm will persist. The problem is that the same mechanism can 
stabilize any norm. People could just as easily agree that excessive drinking is 
proper behavior and punish teetotalers. If it is true that norms often promote 
self-control, then we need an explanation of why such norms are likely to arise 
and spread. In this section, we sketch one such mechanism. 

Suppose people modify their beliefs by imitating the successful. If they 
sometimes imitate people from neighboring groups with different norms, then 
under the right circumstances norms that solve self-control problems will spread 
from one group to another because their enforcement makes people more suc- 
cessful and therefore more likely to be imitated. 

Consider a model in which a population is subdivided into n social groups 
(numbered d= 1, . . . ,ri). There are two alternative behaviors: individuals can be 
self-indulgent or abstentious. Self-indulgent individuals succumb to the temp- 
tations of strong drink, while abstentious individuals restrain themselves. Ab- 
stentious individuals are better off in the long run. They make more money, live 
longer, are healthier, and so on, and everyone agrees that the short-term plea- 
sures of the bottle are not sufficient to compensate for the long-term costs that 
result. Nonetheless, because individuals do not have time-consistent preferences, 
everyone succumbs to the temptations of the table and drinks to excess. 

Next, assume that there are two social norms governing consumption be- 
havior. People can be puritanical or tolerant. Puritans believe that alcohol con- 
sumption is wrong and disapprove of those who drink. Tolerant people believe 
everyone should make their own consumption decisions. Each type disapproves 
of the other: puritans believe that no one should tolerate excess, and the tolerant 
think that others should be tolerant as well. These norms affect the costs and 
benefits of the two behaviors. When puritans are present, the people who drink 
suffer social disapproval, and because this cost is incurred immediately, it can 
cause people to choose not to drink when they otherwise might. Thus, as the 
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proportion of the population who hold puritanical beliefs increases, the pro- 
portion of people who drink decreases, and people are better off in the long run. 
To formalize these ideas, let pd be the frequency of the puritanical norm in group 
d. Then W\, the average payoff of puritanical individuals in group d, is given by: 

W ] {p d ) = W 0 -s[\ - Pd )+g Pd (2] 

Wj, the average payoff of tolerant individuals in group d, is given by: 

W 2 {p d ) = W 0 -sp d + [g+S)p d (3) 

Wo is the baseline payoff of drinkers in a completely tolerant group. Individuals of 
each type suffer disapproval and a reduction in welfare when the other type is 
present in their social group. These social effects on welfare are represented by 
the terms proportional to 5 in equations 2 and 3. However, the welfare of all 
individuals is increased by the fraction of puritanical individuals because every- 
body is less likely to drink when puritans are present to shame them. These effects 
are represented by the terms proportional to gandg+ d. The parameter S captures 
the idea that puritans may have a different effect on each other than they do on 
the tolerant: perhaps bigger because they are more sensitive to the opinions of 
their own kind; perhaps smaller because they are already avoiding strong drink. 

Next, the following process governs the evolution of these norms within a 
group. During each time period, each individual encounters another person, 
compares his welfare to the person he encounters, and then, with probability 
proportional to the difference between their payoffs during the last time period, 
adopts that person’s norm. In particular, suppose that an individual with norm i 
from group / encounters an individual with norm j from group d. After the 
encounter, the probability that an individual switches to j is: 

Prob(; | i, j ) = 1{1 + fWiiPf) ~ Wi{p d )\\ (4) 

When the parameter /} equals zero, payoffs do not affect imitation — people 
imitate at random. When /f > 0, people are more likely to imitate high payoff 
individuals. Notice that since an individual’s payoff depends on the composition 
of his group, there will be a tendency for ideas to spread from groups in which 
beneficial norms are common to groups in which less beneficial norms are 
common. 

Let mjf be the probability that an individual from group / encounters an 
individual from group d. A pf, the change in pf during one time period, is given by: 

A Pf = PPf[W 1 (p f )-W(pf)] 

+ £ m df p\p d m (pd) - W(pd)] - p f m (p f ) - w(p f )]} 

<¥f 

+ £ rn df [p d - p f ){ 1 + m(Pd) - W<Pf)]} (5] 

*# 

To make sense of this expression, first assume that people only encounter in- 
dividuals from their own social group. 

Apf = HPf(W l (p f )-W{p f )] 


( 6 ] 
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This is the ordinary replicator dynamic equation. This equation simplifies to 
have the form: 


Ap f = 7.pf{\ -pf)(p-p) (7] 

where a = f}{2 s — <5] and p= s/(2s — <5]. Thus, when each social group is isolated 
and the effects of social sanctions are large compared to the effects of drinking 
[25 > 5), there are two stable evolutionary equilibria: groups consisting of all 
puritans or all tolerant individuals. If the presence of puritans benefits other 
puritans more than it benefits the tolerant (<5 > 0], then the all-puritan equi- 
librium has a larger basin of attraction. If puritans benefit the tolerant more, then 
the all-tolerant equilibrium has a larger basin of attraction. 

When there is contact between different groups, the last two terms in 
equation 5 affect the change in frequency of norms within social groups. The 
third term is of most interest here. If jS = 0, this term is proportional to the 
difference in the frequency of puritanism between the groups and simply rep- 
resents passive diffusion. If, however, j6 > 0, there is a greater flow of norms from 
groups with high average payoff to groups with lower average payoff. This dif- 
ferential flow arises because people imitate the successful and norms affect the 
average welfare of group members. Can this effect lead to the systematic spread 
of beneficial norms? 

For the beneficial puritanical norm to spread, two things must occur. First, 
such a norm must increase to substantial frequency in one group. Second, it must 
spread to other groups. Here we address only the second question. To keep 
things simple, we further assume that social groups are arranged in a ring and 
that individuals have contact only with members of two neighboring groups. 
Now, suppose that a random shock causes the puritan norm to become common 
in a single group. Will this norm spread? To answer this question, we have sim- 
ulated this model for a range of parameter values. Representative results are 
shown in figure 5.5 that plots the ranges of parameters over which the beneficial 
norm spreads. The vertical axis gives the ratio of m (the probability that in- 
dividuals interact with others outside of their group] to a (rate of change due to 
imitation within groups], and the horizontal axis plots p (the unstable equilib- 
rium that separates the domains of attraction of puritanical and tolerant equi- 
libria in isolated groups]. The shaded areas give the combinations of m/a and p, 
which lead to the spread of the puritanical norm to all groups, given that it was 
initially common in a single group for two values of g. 

First, notice that the beneficial norm spreads most easily when the level 
of interaction between groups is intermediate. If there is too much mixing, the 
puritanical norm cannot persist in the initial population. It is swamped by 
the flow of norms from its two tolerant neighbors. If there is too little mixing, 
the puritanical norm remains common in the initial population but cannot 
spread because there is not enough interaction between neighbors for the ben- 
eficial effects of the norm to cause it to spread. 

Second, to understand the effect of g, consider the case in which g= 0. Even 
when the norm produces no benefit to individuals as it becomes common, it can 
still spread if the puritanical norm has a larger basin of attraction in an isolated 
population (<5 < 0]. In this case, the costly disapproval of harmless pastimes can 
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Figure 5.5. Plots parameter combinations that lead to the spread of the group 
beneficial norm between groups. The vertical axis gives the ratio of m, the probability 
that individuals interact with others outside of their group to a, the rate of change 
due to imitation within groups. The horizontal axis plots p, the unstable equilibrium 
that separates the domains of attraction of all puritanical and tolerant equilibria in 
isolated groups. The shaded areas give the combinations of m/a and p that lead to the 
spread of the puritanical norm given that it has become common in a single group for 
two values of g, the extent to which individual behavior is affected by norms. Notice 
that the beneficial norm spreads when the level of interaction between groups is 
intermediate. If there is too much mixing, the puritanical norm cannot persist in the 
initial population. If there is too little mixing, it can persist in the initial population but 
cannot spread. 



seriously handicap the tolerant when puritans are only moderately common. To 
understand why, consider a focal group at the boundary between the spreading 
front of puritan groups and the existing population of tolerant groups. The focal 
group, in which both norms are present, is bounded on one side by a group in 
which puritan norms are common and on the other side by a group in which 
tolerant norms are common. Since groups on both sides of the boundary have 
the same average payoff, the flow of norms will tend to move the focal group 
toward an even balance of the two norms. If the domain of attraction of the pu- 
ritanical norm includes 0.5 and if there is enough mixing, then mixing with 
neighboring groups can be enough to tip the focal group into the basin of attrac- 
tion of the puritanical norm. This is true even though the differential success 
owes only to puritans avoiding the costs imposed on the tolerant by puritans. To 
see why increasing g increases the range of values of p that allow the beneficial 
norm to spread, consider again a focal group on the boundary between the 
regions in which the puritanical norm is common and uncommon. When g > 0, 
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individuals in the focal group are more likely to imitate someone from the 
neighboring group where the puritanical norm is common than the other 
neighboring group where tolerant individuals are common because individuals 
from the former group are more successful. Therefore, the flow of norms will 
tend to move the focal group toward a frequency of puritans greater than 0.5. 

It is interesting to note that the rate at which this process of equilibrium 
selection goes on seems to be roughly comparable to the rate at which traits 
spread within a single group under the influence of the same learning process. 
Game theorists have considered a number of mechanisms of equilibrium se- 
lection that arise because of random fluctuations in outcomes due to sampling 
variation and finite numbers of players (e.g., Samuelson, 1997]. These processes 
also tend to pick out the equilibrium with the largest domain of attraction. How- 
ever, unless the number of individuals in the population is very small, the rate at 
which this occurs is very slow. In contrast, in the simulations we performed, the 
group beneficial trait spread from one population to the next at a rate roughly 
half the rate at which the same imitate-the-successful mechanism led to the 
spread of a trait within an isolated group. Of course, we have not accounted for 
the rate at which the beneficial norm becomes common in an initial group. This 
requires random processes. However, only the group, not the whole population, 
needs be small, and the group must be small only for a short period of time for 
random processes to give rise to an initial “group mutation,” which can then 
spread relatively rapidly to the population as a whole. 


Conclusion: Are Norms Usually Sensible? 

We have shown that it is possible for norms to guide people toward sensible 
behavior that they would not choose if left to their own devices. Norms could be 
sensible, just as functionalists in anthropology have claimed. However, the fact 
that they could be sensible does not mean that they are sensible. There are some 
well-studied examples, like the alkali treatment of corn, and there are many 
other plausible examples of culturally transmitted norms that seem to embody 
adaptive wisdom. However, as documented in Robert Edgerton’s book, Sick 
Societies (1992], there are also many examples of norms that are not obviously 
adaptive, and, in fact, some seem spectacularly maladaptive. Such cases might 
result from the pathological spread of norms that merely handicap the tolerant 
without doing anyone any good (and perhaps harm puritans as well?]. Or they 
might result from antiquated norms that persist in a frequency above a large 
basin of attraction for tolerance, having lost their original fitness-enhancing effect 
due to social or environmental change. More careful quantitative research on the 
costs and benefits of alternative norms would clearly be useful. 

We believe that it is also important to focus more attention on the processes 
by which norms are shaped and transmitted. Anthropologists and other social 
scientists have paid scant attention to estimating the magnitude of evolutionary 
processes affecting culture change in the field or lab, although several research 
programs demonstrate that such estimates are perfectly practical (Aunger, 1994; 
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Inskoetal., 1983; Labov, 1980; Rogers, 1983; Rosenthal and Zimmerman, 1978; 
Soltis, Boyd, and Richerson, 1995], What happens to a Maya who does not 
utilize the normative form of alkali treatment of corn in her traditional society? 
What are the nutritional effects? The social effects? From whom do people learn 
how to process corn? How does this affect which variants of the process are 
transmitted and which are not? Only by answering such questions will we learn 
why societies have the norms they have and when norms are adaptive. 
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PART 2 

Ethnic Croups and Markers 


Human populations are richly subdivided into groups marked by 
seemingly arbitrary symbolic traits, including distinctive styles of dress, cui- 
sine, or dialect. Such symbolically marked groups often have distinctive moral 
codes and norms of behavior, and sometimes exhibit economic specialization. 
Ethnic groups provide the most obvious example of such groups, but the 
phenomenon includes groups based on class, region, religion, gender, and 
profession. Ethnic groups, present in all historical periods, often split and 
merge through time, yet many have substantial historical continuity. 
Nowadays ethnic groups can have millions of members, but even in simple 
hunting and gathering societies, symbolically marked groups are much larger 
than the residential band, typically linking roughly one thousand people. 

The evidence is fairly clear that the symbolic marking is not simply a 
by-product of a common cultural heritage. If cultural boundaries were 
impermeable, like species boundaries, then this fact would explain the 
association between symbolic markers and other traits. However, group 
boundaries are highly permeable. The movement of people and ideas between 
groups attenuates group differences. Thus, the persistence of existing 
boundaries and the birth of new ones indicate that other social processes resist 
the homogenizing effects of migration and the strategic adoption of ethnic 
identities. Moreover, since groups are typically fairly large, such processes 
likely produce symbolic marking as an unintended by-product of human 
choices made for some other reason. 

The following two chapters explore the idea that symbolically marked 
groups arise and are maintained because dress, dialect, and other markers 
allow people to identify in-group members. In chapter 6, we analyze a model 
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that assumes that identifying in-group members is useful because it allows 
selective imitation. Rapid cultural adaptation makes the local population 
a valuable source of information about what is adaptive in the local 
environment. Individuals are well advised to imitate locals and avoid learning 
from immigrants who bring ideas adapted to other environments. In chapter 7 , 
we (along with Richard McElreath] study a model in which markers allow 
selective social interaction. Rapid cultural adaptation can preserve differences 
in moral norms between groups. It’s best to interact with people who share 
beliefs about what is right and wrong, what is fair, and what is valuable. Thus, 
once there are reliable symbolic markers, selection will favor the psychological 
propensity to imitate and interact selectively with individuals who share those 
markers. 

These models have several interesting and, at least to us, less-than-obvious 
properties. First, the same nonrandom interaction that makes markers 
useful also creates and maintains variation in symbolic marker traits as an 
unintended by-product. Nonrandom interaction acts to increase correlation 
between arbitrary markers and locally adaptive behaviors. This, in turn, makes 
markers more useful, setting up a positive feedback process that can amplify 
small differences in markers between groups. Second, this process is not 
sufficient by itself to generate group markers. There must be some initial, 
perhaps weak correlation between symbolic expression and group 
membership, and there has to be some kind of population structure so that 
groups are at least partly isolated from each other. Otherwise, the positive 
feedback process cannot get started. Third, once groups have become sharply 
marked, the feedback process is sufficient by itself to maintain group marking 
even if groups are perfectly mixed and there is no population structure other 
than that caused by the markers. The models also make a number of 
interesting predictions about the spatial and temporal patterns of symbolic 
expression. 

If the processes captured in these models are important in creating 
ethnic groups, then ethnic groups should have arisen as soon as cumulative 
cultural evolution became important in the human lineage. Rapid cumulative 
cultural evolution is an engine for generating important differences between 
groups, both in subsistence technology and other kinds of local ecological 
adaptation, and differences in moral norms and other determinants of social 
behavior and institutions. Thus, this picture of ethnicity predicts that symbolic 
markers should appear in the archaeological record around the same time as 
the signs of cumulative cultural adaptation, which we take to be increased 
variation in space and highly refined cultural adaptations. 

One very important and widespread component of ethnicity is not 
obviously an entailment of these models of ethnicity, namely, ethnocentrism. 
Quite commonly, but as Brewer and Campbell (1976] showed, by no means 
universally, people derogate members of other ethnic groups. In many times 
and places, these feelings led to interethnic conflict. We can think of two 
reasons why the kinds of ethnic groups that arise in these models might be 
associated with in-group favoritism and out-group bias. The first is that, on our 
view, groups of people who share distinctive moral norms, particularly norms 
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that govern social interaction, quite likely become ethnically marked. This 
suggests that ethnocentric judgments easily arise because “we the people” 
behave properly, while those “others” behave improperly, doing disgusting, 
immoral things, and showing no remorse for it, either. Second, as we explain 
in part 3 on group cooperation, we expect group selection to work at levels of 
population structure at which there is lots of cultural variation affecting group 
success. Quite plausibly, the differences in subsistence and moral systems 
assumed in these models would give rise to group selection at the level of 
ethnic groups, particularly group selection driven by differential imitation of 
successful groups. This in turn implies that ethnic groups should be one locus 
of economic, political, and military cooperation. Of course, cooperation 
within groups creates competition between groups for the resources that 
people want, and resulting norms lead to in-group cooperation. 
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6 


The Evolution of 
Ethnic Markers 


Much of the debate about human sociobiology has been framed as 
a binary opposition. Sociobiologists argue that evolutionary theory is useful for 
understanding humans because much of our behavior is currently adaptive, or was 
adaptive under food-foraging conditions. To be sure, they aver, culture occa- 
sionally causes human behavior to drift away from the fitness-maximizing op- 
timum, but in the long run behaviors that have important effects on Darwinian 
fitness should tend to be adaptive. Critics of this view argue that the existence of 
culture has allowed the human species to transcend ordinary evolutionary im- 
peratives. Culturally transmitted behavior must not be so maladaptive as to lead 
to the extinction of the social group, but as long as this rather weak constraint 
is satisfied, it is argued, people are free to elaborate their culture more or less as 
they please. 

We believe that this dichotomy is false. Culture is neither autonomous and 
free to vary independently of genetic fitness, nor is it simply a prisoner of genetic 
constraints. Our rejection of this dichotomy is based on what we call the “dual 
inheritance” theory of the interaction of genes and culture (Boyd and Richerson, 
1985). The essential feature of this theory is that, like genes, culture should be 
viewed as a system of inheritance. People acquire beliefs, attitudes, and values 
from others by social learning and then transmit them to others. Human behav- 
ior results from the interaction of genetically and culturally inherited informa- 
tion. In the theoretical models we have constructed to represent this interaction, 
two results stand out: (1) The cultural system of inheritance has many properties 
that make it quite different from the genetic system. For example, an individual 
can observe the behavior of a number of peers and choose the “best” behavior. 
Such properties may often enhance genetic fitness because they allow modes of 
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adaptation not available to noncultural species. [2) These same properties can 
lead to the evolution of many cultural traits that are costly divergences from 
those that would increase genetic fitness. Culture is an evolutionarily active part 
of a system that, jointly with genes and environment, can account for much of 
human behavioral variation. 

Here we will illustrate this general argument in the context of a particular 
problem, the evolution of markers of group membership. One of the most strik- 
ing and unusual features of the human species is that it is subdivided into ethnic 
groups. Barth (1969) identified what we take to be the critical feature of eth- 
nicity: people identify themselves, and are identified by others, as members of an 
ethnic group based on a set of culturally transmitted characters. Some of these 
traits, such as language, dress style, ritual, and cuisine, appear to be arbitrary sym- 
bolic “markers” of ethnic affiliation, while others are more directly functional 
cultural traits such as basic moral values and standards of excellence. Member- 
ship in a particular ethnic group can have important effects on an individual’s 
economic behavior and political and social interactions. 

The interpretation of ethnic markers is controversial. Sahlins (1976) has 
argued that one must choose between functional explanations and nonfunctional 
cultural explanations of symbolic marker characters. We will show that this 
dichotomy oversimplifies the relationship between genetic and cultural evolu- 
tion. Ethnicity provides a good example of how functional organic adaptation 
and symbolic cultural processes are thoroughly intertwined in human evolution. 
Our argument is based on an evolutionary model embodying two mechanisms 
that cause a population occupying a variable environment to be subdivided on 
the basis of ethnic markers. These mechanisms result from a pattern of encul- 
turation in which individuals are disproportionately influenced by two kinds of 
people: those who are similar to themselves and those who are successful. Even 
though these two mechanisms cause groups to become differentiated based on 
arbitrary symbolic markers in a way that could not be predicted from fitness 
maximization alone, they will be favored by natural selection because they allow 
more accurate adaptation to variable environments. 

This application of dual inheritance theory emphasizes the fitness-enhancing 
properties of culture. We have chosen this emphasis for two reasons. First, it is 
interesting to try to understand why a cultural system of inheritance arose in the 
hominid lineage and how that process shaped the way that culture is transmit- 
ted. Most likely, the organic capacities that allow culture to be stored and trans- 
mitted arose through the action of natural selection. In the context of this 
example, we are interested in why selection favored mechanisms of cultural 
transmission that give rise to ethnic groups. Second, the reasons why culture is 
adaptive are both subtle and interesting. Even when culture is highly adaptive, it 
has its own evolutionary properties and can lead to patterns of behavior that could 
not be understood in the absence of knowledge of how cultural processes operate. 
To understand why ethnic markers allow more accurate adaptation to variable 
environments, one must understand how the cultural processes that give rise to 
ethnic differentiation operate. We have discussed the properties of cultural in- 
heritance that lead to genetically maladaptative behavior elsewhere (Boyd and 
Richerson, 1985). Knauft (1987) also gives an intriguing empirical example of 
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how the differences between genetic and cultural inheritance can give rise to 
behavior that is genetically maladaptive. 


Models of Cultural Evolution 

We define culture as information — skills, attitudes, beliefs, values — capable of 
affecting individuals’ behavior, which they acquire from others by teaching, 
imitation, and other forms of social learning. A particular member of a set of 
attitudes, beliefs, and values will be referred to as a cultural variant. [See Boyd 
and Richerson, 1985, ch. 3 for an extended discussion of this definition.} We 
have adopted this definition because it focuses attention on the means by which 
cultural traditions are perpetuated. Culture is acquired by individuals by teach- 
ing, imitation, and other forms of social learning from other individuals, stored in 
individual brains, and transmitted by teaching and imitation to others. 

Recently, there has been a fair amount of interest in applying concepts drawn 
from evolutionary biology to the problem of cultural evolution (e.g., Campbell, 
1975; Cavalli-Sforza and Feldman, 1981; Boyd and Richerson, 1985}. Despite 
the fact that cultural and genetic evolution differ in important ways, this meth- 
odological borrowing has been fruitful because genes and culture both have 
population-level properties. That is, individual behavior depends in part on the 
cultural variation in the population from which individuals acquire cultural 
variants. At the same time, which cultural variants are available in the population 
to be acquired depends on what happened to individuals with different variants 
in the population in the past. For example, in every generation some individuals 
will invent or learn new behaviors, modifying the variants they originally imitated 
and transmitting the new variants to others in the process of enculturation. Cul- 
tural evolution can be viewed as a complex of sampling and modifying processes 
that operate iteratively on a population of variable culture-bearing individuals. 
That there is a very general analogy between genes and culture is a commonplace 
observation; what is new is the reworking of methods of analysis developed by 
evolutionary biologists to build a useful theory from the old analogy. 

Simple mathematical models are among the most important tools that 
biologists use to study population-level processes. The tradition of their use 
began in evolutionary biology with Wright, Fisher, and Haldane in the first part 
of the last century and is continued today by people like John Maynard Smith, 
W. D. Hamilton, and many others. The goal of such models is to isolate the 
population-level consequences of a limited set of processes by stripping away all 
of the confusing detail due to other processes. For example, kin-selection models 
address the question: when can selection favor behaviors that reduce the fitness 
of the individual performing them, given that they increase the fitness of other 
individuals affected by the behavior? In such models virtually all the actual be- 
havioral and ecological detail is suppressed, so that exactly the same mathe- 
matical model is applied, for example, to coalition behavior among macaques 
and communal nesting in scrub jays. The intent of the model is to give insight 
into kin selection as a generic evolutionary process, not to account for the details 
of particular examples of the process. Evolutionary biologists construct many 
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such simple models, each isolating one or a few processes. That a particular pro- 
cess is neglected in a model is not to say that it is unimportant, only that we 
desire to focus on something else for the moment. This sort of theorizing is 
sometimes stigmatized as “reductionistic.” A more apt characterization would 
be “modular.” Real evolutionary phenomena are complex; except for deliber- 
ately controlled experiments, we expect to link several such models together to 
achieve a satisfactory explanation of real events. 

Nevertheless, the study of the simple modules in isolation is useful because 
it has proven difficult to deduce the population-level consequences of individual 
processes using verbal reasoning alone. Population processes involve the inter- 
action of phenomena occurring at two different levels of organization and two 
distinct time scales. The individual and population levels of organization interact 
through the sampling processes inherent in reproduction or socialization. The 
day-to-day ecological time scale, on which processes of change act (e.g., selective 
mortality], interacts with the long-run evolutionary time scale on which adap- 
tations of particular kinds are or are not produced. Even the simplest examples 
of evolutionary processes are thus rather complex. Mathematics makes it rela- 
tively easy to consistently and systematically trace the implications of a given 
set of assumptions, especially when the processes modelled are probabilistic or 
quantitative. Simple but formal models are a useful mental prosthesis to reduce 
the handicap of a certain kind of a cognitive limitation. It is important to realize 
that such models serve a rather narrow function, the testing of explanations for 
logical consistency. While they are tremendously useful in this role, they are only 
a supplement to other theoretical and empirical tools in the social and biological 
sciences, not a replacement for them. 


A Model of the Evolution of Ethnic Markers 

The existence of ethnic groups and similarly marked social units suggests two 
evolutionary questions: (1] What are the processes that would cause a human 
population to split into two groups distinguished by cultural marker traits? 
(2] Could such processes give rise to cultural variation that is biologically 
adaptive in the sense of increasing reproductive success? 


Motivating the Model 

Let us approach these two questions by turning the second one around: how 
should natural selection have shaped the processes by which individuals acquire 
culture? At the very least, this way of viewing the problem ought to be appro- 
priate for considering the origin of organic capacities that make culture possible. 
Consider an ancient human population that has recently expanded into a new 
habitat. Some individuals in the new habitat will have adopted beliefs and values 
that are appropriate in the new habitat, but many will share the values and 
beliefs of individuals in the old habitat. This lag in cultural adaptation could 
result from at least two factors: (1] innovation is slow and the occupation of the 
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new habitat is recent; (2] there is an exchange of individuals between habitats, so 
that some individuals in the new habitat acquired their beliefs and values in the 
old habitat. If either of these two factors obtain, many individuals will carry 
variants that are appropriate in the old habitat, but not in the new one. Assuming 
that natural selection plays a strong role shaping cultural capacities, it will struc- 
ture the acquisition of culture so that individuals in each habitat have the best 
chance of acquiring the set of beliefs and values that are appropriate there. 

If one set of beliefs or values has easily observable advantages relative to the 
others, then there is an easy answer: individuals should adopt the beliefs and 
values that maximize reproductive success. It seems likely, however, that people 
commonly must choose among variant beliefs where it is quite difficult to de- 
termine which belief is most advantageous, even though the beliefs, in fact, 
differ in utility. Behavioral decision theorists (Nisbett and Ross, 1980} and stu- 
dents of social learning (Rosenthal and Zimmerman, 1978) argue from empirical 
evidence that the complexity and number of real decisions forces people to use 
simple rules of thumb. Chief among these is a heavy reliance on imitation to 
acquire most of their behavior. 

Studies of the diffusion of innovations (summarized in Rogers with Shoe- 
maker, 1971) suggest that people often use two simple rules to increase the 
likelihood that they acquire locally adaptive beliefs by imitation. The chance that 
individual A will adopt an innovation modeled by individual B often seems to 
depend upon (1) how successful B is, and (2) the similarity of A to B. When it is 
difficult to evaluate whether an innovation is sensible, imitating the successful 
seems like a good general rule; if the innovation is beneficial, people who use it 
will be more successful, on the average, than those who do not. It also seems 
sensible to condition adoption on similarity. If a model is very different from one- 
self, the model’s success might not indicate that the innovation would be useful in 
one’s own circumstances. In the interests of simplicity, we will model a situation 
in which success and similarity are the only adoption rules people use. As in the 
case of kin-selection models, the model is meant to yield insight into the oper- 
ation of this particular pair of decision rules as their effects are integrated over 
individuals and time to produce evolutionary results. Since many other important 
processes are left out, the model is meant to apply partially and qualitatively to a 
great many cases, but to be a complete quantitative description of none. 

How cultural populations will evolve under the influences of these two 
processes depends a great deal on what people use as indicators of success and 
similarity. Because our focus here is on the problem of the origin of a capacity for 
culture under the influence of natural selection, we will assume that the index of 
success is a correlate of genetic fitness and that the index of similarity is a con- 
spicuous symbolic character, like dialect, acquired from primary socializers such 
as parents. As far as the formal model is concerned, any standard of success or 
similarity can be substituted. If these assumptions are relaxed, the model may still 
be appropriate to understanding how ethnic groups form, but not to the problem 
of how such a capacity evolved in the first place. Ethnicity might be a costly 
by-product of some other advantage associated with ability to recognize success 
and similarity. The narrow interpretation we give here is not meant to prejudge 
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these empirical issues. (See Boyd and Richerson, 1985, ch. 8, for a model in which 
the standard of success is explicitly cultural and in which it departs very sharply 
from what selection on genes would favor.) 

Is the evolution of ethnic markers possibly an adaptive result of using these 
two rules in cases where more direct decisions are too costly to use? It is fairly 
obvious that if most people adopt beliefs or values modeled by successful people, 
beliefs or values that lead to success will spread. It seemed possible to us that 
coupling a propensity to imitate the successful with a propensity to adopt the 
beliefs and values of those who are similar to oneself might cause groups oc- 
cupying different habitats to become culturally isolated from each other because 
the cultural markers used to judge similarity would diverge in the two popula- 
tions. To check the cogency of this intuition, we analyzed the following model. 

Formalizing the Intuition 

Real environments and real means of exploiting them are complex. However, we 
think that the cogency of intuitions can be evaluated using quite simple models. 
Accordingly, we imagine that there are two ecological “niches” that differ ac- 
cording to the optimal value of an “adaptive” character. For example, suppose 
that there are two habitats — one moist, one dry. The adaptive character could 
be a belief that affects the extent to which a person relies on stock raising as 
opposed to cultivation. This belief might be the extent to which an individual 
believes that cattle ownership is an intrinsic measure of a person’s worth as a 
human being. In the dry habitat the most successful subsistence strategy might 
be pure pastoralism, and thus the optimal value of the adaptive character is a 
heavy valuation of cattle. In the moist habitat the most successful strategy might 
involve mostly horticulture, and a lesser valuation of cattle might lead to a more 
successful subsistence strategy. 

To represent these assumptions mathematically, we suppose that each in- 
dividual’s subsistence strategy can be characterized by a single number labeled A. 
This can be thought of as an index of the extent to which individuals’ beliefs lead 
them to depend on stock raising. The habitats are labeled 1 and 2, and the op- 
timal values of A are 9\ and 0 2 . The more that an individual’s adaptive character 
deviates from the optimum in his or her habitat, the lower on average will be his 
or her success (and genetic fitness). More mathematical detail is given in the 
appendix in Boyd and Richerson (1987). In terms of the example, 0\ might be 
the value of A that corresponds to mostly pastoralism, and 6 2 mostly horticul- 
ture. In what follows we will sometimes refer to the adaptive character as the 
amount of pastoralism, in order to make the presentation less abstract. The 
reader should keep in mind, though, that the adaptive trait is not meant to refer 
to any specific situation. Rather, it is meant to formalize the idea that different 
beliefs and values are more or less adaptive in different environments. 

We assume, further, that each individual is characterized by an arbitrary 
neutral “marker” character. For example, the marker trait might be an index 
of dialect, such as the extent to which people pronounce r’s. It is arbitrary and 
neutral in the sense that many dialect variants with no direct effect on adaptive 
success are possible, although, as we shall see, there may be very strong indirect 
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Figure 6.1. Assumed life cycle of cultural transmission. 


effects of marker traits upon fitness. Once again, we will assume that the marker 
trait can be described by a single number, labeled M. Thus, in the context of the 
model, each individual’s culturally acquired beliefs can be described by a pair of 
numbers, A and M. 

We assume that these two cultural traits are transmitted according to the life 
cycle shown in figure 6.1. This life cycle is meant to reflect the fact that children, 
adolescents, and young adults have different patterns of enculturation. Indi- 
viduals acquire their marker trait (e.g., their dialect] at an early age from a set of 
primary socialization agents (“socializers” for short). They acquire their adaptive 
trait at a later age by observing the behavior of a much wider range of individuals 
whom we will refer to as “models.” Socializers need not be biological kin — the 
key assumption is only that the amount of mixing between habitats is much greater 
for models than for socializers (w»m). As we shall see, this condition allows the 
differentiation of marker traits, hence a sense of ethnic distinctiveness, to build 
in the local environment. We further assume that dialect is acquired through a 
process of faithful copying. That is, on average people acquire the dialect of the 
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community in which they were raised. We formalize this idea by assuming that 
each naive individual has the opportunity to observe the behavior of n socializers. 
Naive individuals then adopt a weighted average of the dialect of n socializers as 
their own dialect. The fact that socializers may have different weights is meant to 
represent the idea that some individuals may be more important in transmission 
than others due to kinship, social status, or some other factor. 

In later phases of the life cycle, the adaptive trait is not acquired through 
faithful copying. Rather, the acquisition of the adaptive trait is biased by two 
processes. When individuals initially acquire their adaptive trait from models as 
teenagers, they are predisposed to imitate individuals who have similar marker 
traits [i.e., have similar dialects}. This idea is represented mathematically by 
assuming that the basic influence of a model (due to social role and the like} is 
reduced as the difference between the individual’s and the model’s marker trait 
increases (in absolute value}. Subsequently, individuals modify both their adap- 
tive and marker traits by imitating the successful individuals among their local 
young adult peers. We represent this idea mathematically by assuming that 
individuals select one peer to imitate and weight this peer’s modifying influence 
in proportion to his or her success. 

Our goal is to study how these transmission and choice processes might 
change the distribution of culturally transmitted variation in a population through 
time. In particular, we want to know whether different values of the marker trait 
will come to predominate in the two habitats and whether this difference ensures 
that more people acquire the locally adaptive trait. The first step is to describe 
the nature of the cultural variation in the population at some point in time. To 
do this, we use the joint distribution of the two traits in the population. This 
distribution simply specifies the fraction of the population that is characterized 
by each pair of values, A and M. The shape of such a distribution can be sum- 
marized by five numbers. The two means give the “position” of the distribution. 
For example, A tells us the degree to which, on average, the population relies on 
stock raising. The two variances describe the spread of the distribution. For ex- 
ample, a large variance of A would mean a wide range of subsistence techniques 
in use in the population. The covariance tells us the extent to which the two traits 
are correlated. A nonzero covariance means that individuals who rely largely 
upon pastoralism tend to have a similar dialect, different from the dialect most 
commonly used by horticulturalists. 

The next step is to see how the distributions of A and M in the two popu- 
lations change through a single generation. To do this, we must determine how 
events in the lives of individuals change the distribution of cultural variants in 
the population. First, we assume that when the generation begins, the means and 
variances that describe the distribution of cultural variants in the population are 
at initial values. Then we construct submodels to represent individual movement 
from population to population and the two forms of biased imitation. The effects 
of each individual’s behavior on the properties of the population are very small, 
but aggregated over all individuals they may cause an appreciable change by the 
beginning of the next generation. It is this part of the model that does the important 
work of linking individual- and population-level processes. In what follows we 
provide a qualitative description of the most important effects of each process. 
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Faithful copying leaves the mean value of the marker trait, M, in each habitat 
unchanged. This result follows from the assumption that naive individuals faith- 
fully copy the marker of their socializers, who are in turn an unbiased sample of 
the previous generation. 

Mixing of individuals between the environments creates covariance between 
the adaptive character and the marker character in the populations of models, 
even if there was no association before mixing in either habitat. To see why, 
suppose that in habitat 1 people have beliefs that cause them to depend more on 
pastoralism than do people in habitat 2. This means that the value of A, the 
mean value of the adaptive trait, is larger in habitat 1 than in habitat 2. Now 
suppose that the values of M, the mean values of the marker trait, in the two 
habitats are different — for example, individuals in habitat 1 might be more likely 
to pronounce their r’s. Then a model drawn from habitat 1 will be more likely to 
have large values of A and M, while a model from habitat 2 will tend to have 
small values. Thus, models who practice pastoralism will tend to pronounce 
their r’s and those who practice horticulture will tend not to, even if there was 
no association between the two traits in either habitat before mixing. Mixing also 
moves the mean values of A and M in the two habitats toward each other. If no 
other processes affect the means, the populations in both habitats will eventually 
be characterized by the same values of A and M, even though the habitats are 
quite different. 

Biased transmission based on similarity causes the mean value of the adaptive 
trait among individuals who have just acquired their adaptive trait to be closer to 
the mean in their habitat before mixing than the mean adaptive trait among their 
models. By imitating the adaptive trait of people who are like themselves with 
regard to the marker trait, naive individuals reduce the chance that they will 
imitate a model drawn from the other habitat. Thus, this form of biased imi- 
tation has the effect of reducing the amount of mixing. The strength of this 
effect depends on the difference between the mean marker trait in the two 
habitats. If the dialects are not very different, biased imitation based on similarity 
will have little effect. If the dialects are quite different, the result will be to 
substantially reduce the effect of mixing. 

Biased transmission based on success moves the mean value of the adaptive 
trait toward the optimum in both habitats and causes the mean values of the 
marker traits in the two habitats to diverge from each other. Suppose that in 
habitat 1 individuals who rely mostly on pastoralism are more successful on the 
average than individuals who rely mostly on horticulture. Then individuals 
whose beliefs cause them to rely more on pastoralism will be more likely to be 
imitated, and such beliefs will spread. The same process will cause the mean 
values of the marker traits to diverge because of the covariance between the 
marker trait and the adaptive trait that is induced by mixing. Suppose that in- 
dividuals who rely on pastoralism tend to pronounce their r s. Then the practice 
of imitating successful people will cause the pronunciation of r’s to spread be- 
cause successful people will tend to pronounce their r’s. 

The analysis presented so far tells us only what will happen to the distri- 
bution of cultural variants in the two habitats over the course of one generation. 
Normally such changes will be quite small, there will be competing effects, and 
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the direction of change will be dependent on several interacting factors. Our goal 
is to find out what will happen to the population over the long run. To accom- 
plish this goal, we use various techniques to iterate the equations that describe 
the change over one generation. These techniques allow us to accomplish the second 
difficult step of evolutionary reasoning, the connection of short-time-scale ecological 
processes with their eventual evolutionary results. Assuming that the amount of 
mixing of primary socializers is small enough that it may be neglected, two 
important results emerge from such an analysis. Starting with a single, nearly 
uniform population that comes to occupy two habitats: (1) The mean value of 
the adaptive trait in each habitat approaches the optimum, and (2) the mean 
values of the marker trait in the populations become quite different. 

These general properties are illustrated by the numerical simulation of the 
model shown in figure 6.2. 

These qualitative results make sense in the light of the processes described 
previously. The mean value of the adaptive trait is affected by two forces — 
mixing causes the mean in the two habitats to approach each other, while biased 
cultural transmission based on success causes the means to approach the opti- 
mum in each habitat. The impact of mixing depends on the difference in the 
mean marker traits, both because increasing this difference increases the co- 
variance created by mixing and because it makes biased transmission based on 
similarity more effective in causing people to imitate models with more adaptive 
variants. Thus, increasing the difference in the mean marker traits will cause the 
mean adaptive trait in each habitat to move toward the optimum. This in turn 



Figure 6.2. Representative trajectory of the mean value of the adaptive character, the 
marker character, and the covariance in the two habitats. 
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will cause the mean marker traits to diverge. This positive feedback cycle will 
come to a halt only when the mean adaptive trait stops changing, which occurs 
when the adaptive trait in each habitat is at the optimum. 

These results suggest that subdivision of a population into culturally semi- 
isolated groups based on arbitrary symbolic traits such as dialect can result from 
using the success and similarity choice rules. The same analysis also indicates that 
the tendency to imitate similar individuals can be genetically adaptive. Consider 
an individual who does not use similarity as a criterion in weighting poten- 
tial models for the adaptive trait. On average, such an individual will acquire a 
value of the adaptive trait that is farther from the optimum in his or her habitat 
than an individual who does use similarity. If, as we have assumed, the criteria 
by which success is judged are correlated with reproductive success, then indi- 
viduals who use similarity to bias their enculturation will have higher fitness than 
those who do not. If one further assumes that the nature of the imitative process 
is affected by heritable genetic variation, then natural selection will give rise to a 
cultural transmission system that is biased in favor of imitating culturally similar 
individuals. 


Testing the Model 

The modeling exercise sketched here tells us only that the posited forms of 
biased cultural transmission could give rise to ethnically differentiated popula- 
tions. How can we find out whether the posited mechanisms have anything to do 
with the actual formation of ethnic groups in the real world? Such an extremely 
simple model cannot be expected to produce precise numerical predictions that 
can be sharply tested in the fashion of physics. However, empirical data can be 
brought to bear on the veracity of the model in two different ways: First, one can 
investigate whether the basic individual-level processes assumed in the model 
are reasonable. Are ethnic markers typically acquired at an early age compared to 
other cultural traits? Do people use success and similarity as criteria for imita- 
tion? If the assumed processes do not capture at least part of the way that 
ethnicity structures cultural transmission, then the model is unlikely to be use- 
ful. Second, we can examine the model for qualitative predictions, and use com- 
parative or historical data to test them. In this section we present three 
predictions that can be used in this way. Again, we can expect only qualitative 
predictions from the model, and only a statistical pattern of confirmation, given 
noisy data from a complex real world. Nevertheless, if these three predictions 
were to fit a significant number of empirical cases, our confidence that similarity 
and success rules play a substantial role in the evolution of arbitrary marker traits 
would increase. 

1 . If two neighboring ethnic groups are of unequal size, the smaller of the groups 
will have more extreme values of ethnic marker traits and a higher covariance be- 
tween ethnic markers and any adaptive specialty that characterizes the ethnic group. 
The results shown in figure 6.2 are based on the assumption that populations 
living in each habitat are the same size. When the populations are different sizes, 
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the mean value of the marker trait diverges farther from its original state in the 
smaller population than in the larger one, and the covariance between marker 
trait and adaptive trait at equilibrium is larger. Both of these effects result from 
the smaller group’s experiencing a greater amount of mixing than the larger 
group. 

2. If two groups come into contact, the larger the initial difference between them 
with respect to ethnic markers, the more likely they will come to adopt different eco- 
logical specialties. It may be quite difficult for groups that are too similar to diverge 
so as to optimally adapt to two different environments. The rate at which two 
groups diverge with respect to both the marker character and the adaptive 
character depends critically on the initial difference between the two populations 
with regard to the marker character. If the mean value of the marker character is 
very similar in both populations, then they will diverge very slowly because the 
covariance created by mixing will be very small. A small covariance will, in turn, 
slow down the rate at which both the adaptive and marker traits in the two 
populations diverge, which in turn keeps the covariance small. In contrast, when 
the populations are initially quite different, the divergence of both adaptive and 
marker characters will be rapid. In the context of the model, the ultimate equi- 
librium states are the same. However, in the real world, in which many processes 
affect the spread and persistence of ethnic groups, we would expect effects that 
occur rapidly to be much more important. Thus, we would expect that when two 
ethnic groups come into contact, the chance that they will come to adopt dif- 
ferent ecological specialties will be increased if they are initially more distinct. We 
would also predict that the chance that a group entering an area can displace an 
existing population from an ecological or economic specialization will increase if 
the entering group is ethnically distinct. If the groups are sufficiently similar, 
techniques sufficiently superior to be the basis for displacement will rapidly reach 
similar frequency in both groups due to the effects of migration. 

3. When ethnic groups occupy large, contiguous territories, the greatest amount 
of ethnic differentiation should occur at the boundary between groups. In many cases 
of interest, ethnic groups occupy contiguous spatial territories. Individuals who 
are close to the boundary between two groups will have more encounters with 
members of another ethnic group than individuals more distant from the 
boundary. To model this situation, imagine a number of populations arranged 
along a transect in space, and at some point along the transect the optimal value 
of the adaptive character changes abruptly. For example, an abrupt change in 
altitude might lead to a change in rainfall regime. Finally, suppose that there is 
migration or mixing of individuals along the transect so that mixing is more 
likely among neighboring populations than among distant ones. With these as- 
sumptions, the model predicts that the degree of differentiation with respect to 
the marker trait is greatest at the boundary between the two environments, 
where mixing creates the greatest covariance between the adaptive trait and the 
marker trait. 

We are not aware of the existence of data to test these predictions, but 
the information required is of a type that anthropologists are well equipped to 
obtain. 



THE EVOLUTION OF ET 


RKERS 115 


Discussion 

The model presented here suggests that the modes of cultural transmission that 
give rise to ethnically subdivided populations are adaptive because they allow pop- 
ulations to more accurately track a heterogeneous environment. Similar processes 
may favor the development of symbolically marked caste, class, occupational, and 
professional subgroups within complex societies. The process of imitating people 
like oneself sets up a self-reinforcing process that causes subpopulations occupying 
different habitats, or pursuing different economic strategies in the same environ- 
ment, to become culturally isolated. Thus, the mean value of the adaptive trait in 
each habitat converges to the optimum. A population using only transmission 
based on success would adapt much less quickly to a variable habitat. 

It is noteworthy that this mode of adaptation is closed to animals that lack 
a capacity for culture. Such differences between genetic and cultural evolution 
ought to be reflected in basic differences between the natural history of humans 
and other animals. It is interesting that the human species occupies a much 
broader range of habitats than any other mammalian species. Consider the pri- 
mates: if all baboons are classified as a single species, then it is the primate 
species with the widest geographical range, a substantial fraction of sub-Saharan 
Africa. Our closest relatives, chimpanzees and gorillas, are restricted to the trop- 
ical forests of Africa. In contrast, even with only hunting and gathering tech- 
nology, humans occupied virtually every terrestrial habitat. 

Most contemporary theories of speciation hold that a population must oc- 
cupy more than one ecological niche in order for speciation to o ccur (Temple- 
ton, 1981}. Once a portion of a population has adapted genetically to a particular 
niche, selection will favor mechanisms that prevent mating with individuals 
living in some other niche, because the offspring that result from such matings 
will be inferior in both niches. Whether multiple niches are sufficient, or some 
additional factor such as an isolating barrier is necessary, is not completely clear. 
The data from other primate species suggest, however, that typical primate 
species occupy much smaller ranges than the human species, presumably because 
reproductive barriers were favored by selection as successful primates extended 
their ranges to sufficiently different habitats. 

Unlike other mammals, humans acquire massive amounts of adaptive infor- 
mation culturally. Perhaps it is not coincidental that symbol-using humans of the 
late Pleistocene epoch became very widely distributed for a biological species. 
The processes modeled here, by allowing the protection of culturally transmitted 
adaptations to local conditions without genetic isolation, can be considered a 
cultural substitute for speciation. Undoubtedly many aspects of cultural trans- 
mission allow adaptation to a wide range of habitats. However, it does seem 
plausible that the fact that the human species is divided into distinct groups that 
are culturally isolated from each other may play a role in allowing humans to be 
culturally polymorphic and thus to occupy such a wide range of ecological niches. 
This intuition is reinforced by studies like those of Fredrik Barth, which suggest 
that contemporary ethnic groups often occupy different ecological niches. 
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This interpretation illustrates, in the context of a rather simple model, how 
adaptive modes of cultural transmission lead to outcomes that could not be pre- 
dicted without taking cultural processes explicitly into account. Even if one 
assumes that the criteria by which success is judged are coincident with repro- 
ductive success, only the properties of cultural transmission allow populations to 
adapt rapidly to a variable environment. An adaptive outcome — the differenti- 
ation of local groups with regard to marker traits — can be understood only in 
terms of cultural processes. We believe that this argument ought to be very 
interesting to cultural anthropologists. We have not had to leave the confines of 
adaptationist assumptions to show how the properties of culture play a fundamental 
role in human evolution. 

However, once the use of such rules as success and similarity arise, selection 
on genes underlying the capacity for culture may not be able to prevent the 
violation of adaptationist assumptions. For example, processes closely related 
to those modeled here can lead to the “runaway” evolution of marker and pref- 
erence traits, which have no adaptive or functional explanation (Boyd and 
Richerson, 1985, ch. 8). It is easy to imagine that the adaptive uses of cultural 
markers are common enough so that selection on genes maintains a cognitive ca- 
pacity to use them despite the runaway process carrying some to maladaptive 
extremes. We are convinced that complexities of this sort are a pervasive feature 
of the coevolutionary process that links genes and culture. If this idea is correct, 
any attempt to reduce the problems of human evolution to binary choices be- 
tween sociobiological and cultural explanations is bound to fail. The real puzzle 
is to determine how the genetic and cultural systems interact in a unified evo- 
lutionary process. 


NOTE 

We thank Bruce Knauft, Robert Paul, and Joan Silk for thoughtful comments on the 
first draft of this chapter. 


REFERENCES 

Barth, F. 1969. Introduction. Ethnic groups and boundaries. F. Barth, ed. Boston: Little 
Brown. 

Boyd, R., & P. J. Richerson. 1985. Culture and the evolutionary process. Chicago: 
University of Chicago Press. 

Boyd, R., & P. J. Richerson. 1987. The evolution of ethnic markers. Cultural Anthro- 
pology 2:65-79. 

Campbell, D. T. 1975. On the conflicts between biological and social evolution and 
between psychology and moral tradition. American Psychologist 30:1103-1126. 

Cavalli-Sforza, L. L., & M. W. Feldman. 1981. Cultural transmission and evolution. 
Princeton, NJ: Princeton University Press. 

Knauft, B. M. 1987. Divergence between cultural success and reproductive fitness in 
preindustrial cities. Cultural Anthropology 2:94-114. 



TH E EVOLUTION OF ETH N 1C MARKERS 117 

Nisbett, R., & L. Ross. 1980. Human inference: Strategies and shortcomings of social 
judgment. Englewood Cliffs, NJ: Prentice-Hall. 

Rogers, E. M., with F. F. Shoemaker. 1971. The communication of innovations: A cross- 
cultural approach. New York: Free Press. 

Rosenthal, T., & B. Zimmerman. 1978. Social learning and cognition. New York: Aca- 
demic Press. 

Sahlins, M. 1976. Culture and practical reason. Chicago: University of Chicago Press. 

Templeton, A. 1981. Mechanisms of speciation — A population genetic approach. 
Annual Review of Ecology and Systematics 12:23-48. 



7 Shared Norms and the 

Evolution of Ethnic Markers 

With Richard McElreath 


Unlike other primates, human populations are often divided into 
ethnic groups that have self-ascribed membership and are marked by seemingly 
arbitrary traits such as distinctive styles of dress or speech (Barth, 1969, 1981}. 
The modern understanding that ethnic identities are flexible and ethnic 
boundaries porous makes the origin and existence of such groups problematic 
because the movement of people and ideas between groups will tend to atten- 
uate group differences. Thus, the persistence of existing boundaries and the birth 
of new ones suggests that there must be social processes that resist the ho- 
mogenizing effects of migration and the strategic adoption of ethnic identities. 

One recurring intuition in the social sciences is that, since ethnic markers 
signal ethnic group membership and ethnic groups are often loci of cooperation, 
markers persist because they allow people to direct altruistic behavior selectively 
toward coethnics (Van den Berghe, 1981; Nettle and Dunbar, 1997). On closer 
analysis, however, this argument turns out not to be cogent. Altruism can evolve 
only if some cue allows altruists to interact with each other preferentially so that 
they receive a disproportionate share of the benefits of altruism. One such cue 
is kinship (Hamilton, 1964), and another is previous behavior (Trivers, 1971; 
Axelrod, 1984). Another idea is that selection might favor altruists who carried 
an external, visible marker that would allow them to limit their cooperation to 
others who exhibited the marker. However, evolutionary theorists argue that 
this mechanism is unlikely to be important (Hamilton, 1964; Grafen, 1990). 
Nonaltruists with the marker do best because they get the benefit without paying 
the cost. Thus, if any process breaks up the association between the cooperator 
strategies and the markers, such individuals will rapidly proliferate and altruists 
will disappear. 
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Here we argue that markers function to allow individuals to interact with 
others who share their social norms. We present a simple mathematical model 
showing that marked groups can arise and persist if three empirically plausible 
conditions are satisfied: (1} Social behavior in groups is regulated by norms in such 
a way that interactions between individuals who share beliefs about how people 
should behave yield higher payoffs than interactions among people with discor- 
dant beliefs. (2} People preferentially interact with people with whom they share 
easily observable traits like dress style or dialect. (3} People imitate successful 
people, with the result that behaviors that lead to higher payoffs tend to spread. 
We also show that the preference to interact with people with markers like one’s 
own may be favored by natural selection under plausible conditions. We conclude 
by outlining several qualitative, empirically testable predictions of our model. 


A Simple Model of the Evolution of Ethnic Markers 

Consider a population divided into a number of large groups. In each time period, 
each individual interacts with another individual from the same group. People’s 
behavior in these interactions depends on culturally acquired beliefs. We will 
refer to this culturally transmitted belief as the behavioral trait. There are two 
alternative beliefs, labeled 1 and 0. Individuals’ payoffs from the social interaction 
depend on their own behavior and the behavior of their partners in the way given 
in table 7.1. This simple coordination game is meant to capture the intuition that 
many real social interactions go well if people have the same beliefs about proper 
behavior. It is likely that human societies face many problems of this kind. 
An example familiar to many of us is the one of problems in cross-cultural com- 
munication that result from different expectations about interactions and codes for 
communicating (Gumperz, 1982}. The parameter 5 measures the strength of this 
effect. 

We also assume that it is difficult to determine another individual’s beliefs 
about proper behavior before an interaction occurs. Given the large number of 
norms and the fact that some of them will be used only a few times in one’s 
lifetime (Nave, 2000), people cannot always reliably predict the behavior of 
everyone they must interact with or even predict their own behavior, since many 
such norms are unconsciously held. Much the same argument can be made for 
rules enforced by third-party punishment. A stranger who moves to a new village 


Table 7.1. Payoffs ir 

1 the coordination game 


Player 2’s behavior 

Player l’s behavior 

1 0 

1 

1+5 1 

0 

1 1+5 


Note: Payoffs shown for player 1 : <5 is assumed to be 
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cannot guess ahead of time all of the social rules that regulate behavior in his new 
home. People may be able to tell him some of the things that he needs to know, 
but it is still likely that he will make many costly social blunders, perhaps even 
run afoul of basic moral principles (field anthropologists should be familiar with 
this sort of problem). As long as people are sometimes ignorant in these ways, 
people with uncommon behaviors will be at a disadvantage, and the model 
targets these situations, not the entire scope of interaction. 

Of course, people have many traits, such as dialect, clothing style, and 
cuisine, that can be observed, and often these traits are the basis of assortative 
social interaction. To formalize this idea, we assume that there is also a readily 
observable marker trait. This trait also has variants, labeled 0 and 1, and we 
assume that individuals tend to interact with others who have the same variant 
of marker trait. The strength of this propensity is given by the parameter e. 
When e=l, individuals interact at random; when e = 0, they always interact 
with someone with the same marker trait. 

There is much evidence that people who do well in life are more likely to be 
imitated (Henrich and Gil-White, 2001). To incorporate this process, we assume 
that the probability that an individual with behavior i and marker j will be imi- 
tated is proportional to Wij/W, where W is the average payoff in the group. This 
means that combinations of behavior and marker that lead to higher than average 
payoffs will be more likely to be imitated (see Gintis, 2000, for derivation). 

With these assumptions it is possible to derive expressions that describe how 
imitation and social interaction change the frequency of the behavior and marker 
traits in each group. The change in the fraction of the people with marker 1 
within a group, pi, is 

A Pl =5U{(pi-poK\-{\-eW 2 } (1) 

where R{ = D/(UV) 1/2 } is the correlation of behavior and marker, U and V are 
the variances of behavior and marker, and D is the covariance between marker 
and behavior. If R = 1 , everyone who has marker 1 also has behavior 1 ; if R = — 1 , 
then everyone who has marker 1 has behavior 0, and if R = 0, the traits are 
randomly associated. Equation I says that if more individuals use behavior 1 than 
behavior 0, it increases; if fewer individuals use it, it decreases. The rate at which 
this occurs depends on whether the marker allows individuals to interact pref- 
erentially with people who have the same behavior. When R° is near I, most 
individuals with a given behavior have the same marker, and if e is small, they 
almost always interact with individuals with the same behavior as themselves, 
and thus there is little advantage in having the common behavior. When R 2 is 
near zero, most interactions occur at random and individuals with the most 
common behavior have an advantage. 

The change in frequency of the marker I, q\, is approximately given by 
equation (2): 

Aq 1 «25D(p 1 -p 0 )(l-|) (2) 

This expression is valid when the covariance between marker and behavior is 
small — when individuals’ markers predict little about their behavior. When D is 
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positive, marker 1 is associated with behavior 1, and if behavior 1 increases, so 
does marker 1 . The complete expression for the change in q\ shows that this 
effect decreases as D becomes larger. 

Because the effects of social interaction and learning depend critically on the 
covariance between behavior and marker (D], we also need to know how they 
affect the covariance. Social interaction and imitation increase covariance be- 
tween marker and behavior when the covariance is small. The reason is simple: 
individuals with the most common combinations of behavior and marker are 
more likely to interact with others with the same behavior and thus achieve a 
higher payoff. 

We then represent population mixing due to intermarriage, relocation, and 
other factors with a migration phase that removes a proportion m of each group 
and replaces it with migrants drawn from neighboring groups. Clearly, such 
mixing will reduce the differences in the frequencies of both behavior and 
marker between neighboring groups. However, migration also has a less obvious 
and very important effect: as long as there is any difference in the frequencies of 
marker and behavior between neighboring groups, migration increases the co- 
variance between marker and behavior within groups: 

AD = m{D-D + [pi -pi)(qi -q^} [3] 

where p 1; , and D are the average frequencies of behavior and marker and the 

covariance between behavior and marker in neighboring groups that provide 
immigrants. To understand why mixing increases the covariance within groups, 
consider the case in which the frequency of marker and behavior is 0.9 in one 
group and 0.1 in a second group. Further suppose that the covariance between 
marker and behavior within both groups is zero, and therefore the marker is 
useless as a predictor of behavior. Now suppose that we mix the two groups 
completely. Most of the individuals coming into the first group will carry both 
marker and behavior 0, while those coming into the second will carry both 
marker and behavior 1 . The frequency of both markers and both behaviors will 
be 0.5, but most [82%] of the individuals in the population will be either 1,1 or 
0,0, with the result that markers are now good predictors of behavior within 
groups. 

Finally, suppose that individuals sometimes acquire marker and behavior 
traits from different individuals, which leads to the randomization of behavior 
and marker — a process we term recombination. Recombination has no effect on 
the frequencies of behavior and marker, but it reduces the covariance between 
marker and behavior at a rate proportional to r. 


Simulation Results 

We have derived recursions that give the net effect of imitation, migration, and 
recombination on the frequencies of behavior and marker and the covariance 
between them. However, these recursions are too complex to solve analytically, 
and we have, therefore, relied on numerical simulation. We begin by describing 
simulations of the model when there are only two interacting populations. This 
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system provides an intuition for the processes that sometimes give rise to marked 
groups. We then explore the parameter space of the model, varying e [the 
chance of interacting at random), m [migration), d [the effects of social behavior 
on individual welfare), and r [the rate of recombination) to map the range of 
conditions under which marked groups arise. Finally, we generalize the model, 
allowing larger numbers of populations and a general coordination game struc- 
ture. These analyses suggest that the simple model is relatively robust. 

1. Stable behavioral differences between groups usually become ethnically 
marked. Social interaction alone can lead to the evolution of stable differences in 
behavior between two groups. People with more common behaviors achieve 
higher payoffs in the coordination game and are more likely to be imitated. Thus, 
if one behavior is initially common in one group and the alternative behavior is 
initially common in the other group, payoffs from social behavior coupled 
with imitation of the successful will cause the groups to become more different. 
If the diversifying effect of payoff-biased imitation is sufficiently strong com- 
pared with the homogenizing effect of migration, the two populations will reach 
an equilibrium at which behavior 1 is common in group 1 and behavior 0 in 
group 2. In contrast, if the rate of mixing is too high or if initially the same 
behavior is common in both populations, only one behavior will be present in 
both populations at equilibrium. 

If stable behavioral differences between groups exist, each behavior can 
become associated with a different marker variant — behavior 1 will, for example, 
be associated with marker 0 and behavior 0 with marker 1. Figure 7.1 illustrates 
this dynamic. Initially behavior 1 is more common in population 1 and less com- 
mon in population 2. Marker 0 is initially more common than marker 1 in both 
populations but relatively more common in population 2 than in population 1 . 


Figure 7.1. The frequencies of each of 
the four combinations of behavior 
and marker over time in each of two 
populations for m = 0.025, e = 0.25, 
and r= 0.1 . The behaviors are denoted 
by the shape of the symbol, circle 
[=0) or square [=1), and the markers 
are denoted by color, black [= 0) or 
white [=1). Initially behavior 1 
[squares) has frequency 0.55 in 
population 1 and 0.45 in population 2. 
Marker 0 [black] is initially more 
common than marker 1 in both 
populations but relatively more com- 
mon in population 1 (q jj =0.8) than 
in population 2 [qi 2 = 0.7). 
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There is no initial covariance within populations. At first, rare-type disadvantage 
causes behavior 1 to become more common in population 1 and behavior 0 in 
population 2. At the same time, migration generates a negative covariance be- 
tween marker and behavior so that behavior 1 tends to co-occur with marker 0 
and marker 0 with behavior 1 . This in turn strengthens the forces increasing the 
differences between the populations in frequencies of marker and behavior, 
which then generates greater covariance. This positive feedback process (figure 
7.2] continues until a symmetrical equilibrium is reached at which a different 
behavior is common in each population and each behavior is associated with a 
different marker. The adaptive behaviors have become symbolically marked, 
even though the same marker was initially common in both groups. 

However, migration and recombination oppose the positive feedback process 
described. Migration tends to make the two populations the same, equalizing 
the frequency of the markers in each population, and recombination destroys 
the covariance between marker and behavior. If recombination is strong, it dis- 
sipates the covariance between marker and behavior more rapidly than migra- 
tion and imitation can create it. Even though the payoff advantage of being in 
the majority is sufficient to maintain behavioral differences between the two 
populations, these differences do not become ethnically marked. When in- 
dividuals are unable to assort accurately on the basis of markers (e is large), the 
pattern is similar: stable group differences in behavior may emerge and persist, 
but selection on markers is too weak to generate covariance between marker and 
behavior. 

The qualitative arguments are supported by systematic sensitivity analysis. 
We determined the range of parameters under which groups become marked by 
performing a large number of simulations. For each simulation we calculated the 
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Figure 7.2. The feedback process that generates marked groups and the forces that 
oppose this process. 
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value of D, the population average covariance between behavior and marker, 
averaged over the 100 simulations. We held parameter values constant at 
m = 0.01, e = 0.3, r=0.01, 5 =0.5 for parameters not varied in a run of simu- 
lations. Figure 7.3 summarizes these results. When biased imitation can maintain 
stable behavioral differences in the face of migration, stable marker differences 
evolve provided that (1] recombination [r} is not too strong and (2] individuals 
interact sufficiently often with individuals like themselves [e is not too high}. 
There are no cases in which behavioral differences fail to evolve and marker 
differences manage to become stable. 

2. Spatial structure is needed to generate ethnic markers but not to maintain 
them. Migration between groups generates the initial covariance essential for the 
evolution of ethnic markers. However, if individuals are able to use markers 
to assort accurately [e« 1}, spatial structure is no longer necessary to maintain 
ethnic markers once such covariance arises [figure 7.4} and groups end up mixed 
together in space, but high covariance between markers and behaviors remains. 
This configuration can be a stable equilibrium only if r and e are very small. 
However, for somewhat larger values of r and e, there is a long transition period 
during which two ethnically marked types are present without spatial variation. 
A more complex model in which groups occupied different niches would likely 
be able to sustain spatially mixed ethnically marked groups in a wider range of 
circumstances. Also, we will demonstrate later that natural selection would re- 
duce values of r and e if at all possible. This makes the possibility of the evolution 
of such spatially blended systems more likely. Such situations are an interesting 
and unexpected outcome of our model. 

3. Increasing the number of populations increases the range of initial conditions 
that give rise to ethnic markers. Random starting conditions [random frequencies 



Figure 7.3. The evolution of stable marker differences. White regions are combinations of 
parameter values that produced both stable behavioral and marker differences [that is, 
these populations became ethnically marked}. Black regions are cases in which behavioral 
differences were stable but marker differences were not (that is, these populations 
became culturally different but without ethnic markers} . Gray regions are cases in which 
behavioral differences failed to evolve, typically because of strong migration. 



SHARED NORMS AND THE EVOLUTION OF ETHNIC MARKERS 125 

of behavior and marker in each group] often lead to the evolution of behaviorally 
different and marked groups, and this result becomes more likely as more groups 
are added to the system [figure 7.5], The two-group system is most sensitive to 
starting conditions, as this case has the highest chance of randomly generating all 
groups with similar initial behavior frequencies. 



Time 



Figure 7.4. The frequencies of each the four combinations of behavior and marker 
over time in each of two populations. The behaviors are denoted by the shape of the 
symbol, circle [= 0] or square [= 1), and the markers are denoted by color, black (= 0] 
or white (= 1). The initial conditions and value of m are the same as in figure 7.1, but now 
assortment is perfect, e = 0.0, and there is no recombination, r=0.0. As before, at first 
rare-type disadvantage causes the behavior 1 to become more common in population 1 
and behavior 0 in population 2, and migration generates a negative correlatiion between 
marker 1 and behavior 0 [equation 4], However, because there is no recombination, 
this covariance builds up much more rapidly, especially in population 1, in which the 
initially relatively more common marker was also absolutely more common. The high 
correlation between marker and behavior combined with the accurate assortment elim- 
inates rare-type disadvantage, and migration mixes the two groups until they are identical. 
Because the covariance increases more rapidly in population 1, the marker-behavior 
variant in population 2 experiences a transient advantage that is preserved at equilibrium. 
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Figure 7.5. Equilibrium absolute values of D (covariance in the population as a whole] 
for simulations involving two groups (top, 1 00 simulations] and six groups (bottom, 
100 simulations]. Starting conditions were random with parameter values m = 0.025, 
r=0.10, e = 0.30, 5 =0.50. High D becomes more likely as the number of groups 
increases. 


4. Group differences are strongest at boundaries. When more than two 
groups are arrayed in space, the correlation between marker and behavior 
( R = Dj,/ y/UkVf) is greatest at the boundaries between culture areas. Figure 7.6 
shows the steady state in ten populations arranged in a stepping-stone ring. This 
steady state results from an initial clinal distribution of behavior and marker 
frequencies with zero correlation between behavior and marker in each popu- 
lation. There is a region of three populations in the middle in which the 
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Figure 7.6. The steady state that arises from slightly clinal initial distributions of the 
frequencies of marker 1 and behavior 1 in ten populations arranged in a ring. Broken line, 
pi, heavy solid line, qp, light solid line, R. 


frequency of marker 1 and behavior 1 is low and a region of three populations 
at the edges in which these frequencies are high (remember that the popula- 
tions wrap around so that population 1 exchanges migrants with population 
10]. In both of these regions there is little or no correlation between marker 
and behavior. In between these regions are boundary areas in which frequencies 
are intermediate and there is substantial correlation between marker and be- 
havior. 

5. A more general model of social interaction leads to similar results. So far, 
we have assumed that social interaction can be modeled by a game of pure coor- 
dination with equal average payoffs for both equilibria. Symmetric, pure co- 
ordination games are very special because the basins of attraction of the two 
equilibria are the same size. To test whether our results were sensitive to this 
assumption, we ran a number of simulations in which we varied the parameters 
of the completely general two-person coordination game shown in table 7.2. 

The results indicate that the system regularly evolves toward marked, be- 
haviorally distinct groups even when there are large deviations from the perfect 
coordination structure. Thus, our results do not depend in a sensitive way on the 
perfect nature of the game structure we have chosen. This suggests that any 
stable behavioral equilibria, regardless of their relative consequences for group or 
individual welfare, may become marked. 
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Table 7-2. Payoffs in a general two-person 
game with two stable equilibria 



Player 2’s behavior 

Player l’s behavior 

1 0 

1 

1+5 +g 1 -h 

0 

1 1+5 


Note: Payoffs shown for player 1; &, g, and h are 


Evolutionary Stability of the Parameters 

This model depends on four parameters: m, 5, r, and e. The first two formalize 
assumptions about the ecology of the evolving populations. The second pair of 
parameters represents assumptions about human psychology. The simulation 
results indicate that social interactions in which common behaviors have high 
payoff will lead to the evolution of ethnic markers if both e and r are small, or, in 
other words, if people have a psychology that predisposes them to interact with 
individuals with the same marker as themselves and to acquire some markers and 
behaviors as a package. Natural selection will, all other things being equal, fa- 
vor such a psychology (that is, selection will favor mutations that reduce the 
values of e and r). However, selection on other aspects of social learning and 
demands on interaction may restrict the extent to which selection can reduce 
these parameters. 


Discussion 

We have argued that ethnic markers do not function to allow individuals to 
direct altruism to others like themselves because such a system cannot resist 
invasion by cheaters who signal altruistic intent but then do not deliver. In 
contrast, ethnic markers can signal one’s behavioral type when social interactions 
have a coordination structure because in such situations there is nothing to be 
gained from cheating. Both parties in the coordination setting gain the most 
when they honestly advertise their strategy, and as a result both the behavior and 
its advertisement spread when the successful are imitated. Axtell, Epstein, and 
Young (1999) have analyzed another model that is quite different structurally 
but works for similar reasons. 

The intuition that ethnic markers and cooperation are related is not, 
however, without merit. Humans are peculiar in that we often cooperate with 
large numbers of unrelated individuals. As we have argued, the existence of 
ethnic markers alone cannot explain the scale of human cooperation. Yet we 
have shown that markers may evolve when individuals interact in a two-person 
coordination game, and we believe that any process that leads groups to occupy 
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multiple stable equilibria may produce the same result. Two of us have argued 
at length elsewhere that human cooperation results from norms enforced by 
socially created rewards and punishments (Boyd and Richerson, 1990, 1992; 
Soltis, Boyd, and Richerson, 1995; Richerson and Boyd, 1998, 1999). If pun- 
ishment is sufficiently costly, such systems can stabilize a very wide range of 
behavior. Then, competition between groups will lead to the spread of moral 
systems that enhance group survival, welfare, and expansion, including norms 
that lead to enhanced cooperation in economic and military activities. 

As a result, we expect that systems of moral norms, some of which create 
group-beneficial cooperation, should come to be marked by ethnic markers by 
the process described. Punishment transforms the prisoner’s dilemma structure 
of a cooperation problem into a coordination structure. The process we have 
described here can then lead to individuals selecting individuals with whom to 
cooperate on the basis of markers, but the markers themselves do not stabilize 
the cooperation. 


Corollaries and Predictions 

The goal of this kind of modeling study is to demonstrate the cogency of a 
deductive argument linking assumptions about microlevel social interactions to 
the empirically observable macrolevel social patterns that result. Accordingly, 
we conclude by describing several testable predictions of the model. 

Our analysis of the evolutionary stability of e and r makes two predictions 
about the psychological tendencies of human beings: 

1 . Individuals in marked communities should prefer interaction with similarly 
marked individuals. Our analysis of the evolution of e, the rate at which in- 
dividuals interact at random with respect to markers, suggests that natural se- 
lection or an analogous process operating on cultural rules for interaction should 
reduce e to zero, if possible. Thus, to the extent that e represents a psychological 
bias toward interacting with those who look like oneself rather than the ability or 
freedom to interact with ones like oneself, we expect members of marked com- 
munities to prefer individuals marked like themselves, at least when it comes to 
coordination interactions. 

2. Individuals in marked communities should acquire bundles of at least some 
norm and marker traits. While the model does not suggest anything about the 
social learning of noncoordination behaviors and social markers, our analysis of 
the evolution of r, the rate of recombination of behavior and marker traits, 
predicts that, for our model to be relevant, individuals should acquire norm and 
marker traits as a bundle. They should also preserve these associations through- 
out substantial portions of their life spans. If this is not true, the process we 
describe here is unlikely to work. 

The model makes three clear predictions about the nature of the distribu- 
tions of marker traits and their relations to ethnic groups and their histories: 

1 . Ethnic differences should be stronger at boundary regions than deep within 
ethnic territories. Hodder (1977) suggests that this is true for some ethno- 
archeological data from the Lake Baringo region of Kenya, but the data are 
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inadequate to test this prediction. The appropriate test would be examination of 
a large ethnic group, such as the Kikuyu of Kenya, which interacted at many bor- 
der areas with a number of different ethnic groups. Another setting that holds 
promise for testing this prediction is fragmentary migration that brings smaller 
units of a larger ethnic population into contact with other ethnic groups. If these 
groups are on average more marked than their source populations, we may be 
able to conclude that interaction with the other ethnic groups has increased 
selection on markers and magnified initial differences in those settings. 

2. Norm and marker boundaries should coincide, while the distributions of other 
culture items may map onto one another differently. Our model makes no predic- 
tions about the nature of all cultural traits and the distribution of ethnic markers. 
However, if this model is correct, a number of norm differences — on beliefs in 
inheritance, child rearing, household labor, and other categories of human life in 
which there are multiple coordinated solutions to the same problem — should 
correspond to the distributions of marker differences. 

3. Potential marker traits with the greatest initial differences should become 
marked first. One test of this prediction would be to examine ethnographic 
settings in which two isolated source populations have contributed migrant 
groups that have since been in contact for some time. The source populations 
provide estimates of the initial differences in the migrant groups when they came 
into contact. The migrant groups provide estimates of the differences that might 
have grown from those initial differences. This prediction will earn support if the 
traits with greater differences between source populations appear to have led to 
marked traits in the contact groups. 


NOTE 

Supplementary material appears in the electronic edition of Current Anthropology 
44 (2003) on the journal’s web page (http://www.journals.uchicago.edu/CA/ 
home.html). 
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PART 3 

Human Cooperation, Reciprocity, 
and Croup Selection 


A number of years ago the Cambridge paleoanthropologist Rob 
Foley published a book on the evolutionary ecology of early hominins entitled 
Another Unique Species. The title was meant to capture the idea that while 
humans are unique in many ways, so too is every other species. We like the 
book very much, but perhaps the title is a bit misleading. Humans are, if you 
will allow us, “more unique” than any other primate. We are extreme outliers 
in our use of tools, in our ecological and geographical range, in the richness of 
our communication system, and so on and on. Perhaps the most singular 
feature of Homo sapiens is the scale on which humans cooperate. In most other 
species of mammals cooperation is limited to close relatives and (maybe) 
small groups of reciprocators. After weaning most individuals acquire 
virtually all of the food that they eat. There is little division of labor, no trade, 
and no large-scale conflict. Amend Hobbes to account for nepotism, and 
his picture of the state of nature is not so far off for other mammals. In 
contrast, people in even the simplest human societies regularly cooperate with 
many unrelated individuals. Sharing leads to substantial flows of food and 
other resources among different age and sex classes. Division of labor and 
trade are prominent features of every historically known human society, 
and archaeology indicates that such trade has a long history. Violent conflict 
among groups is also quite common. Since the development of agriculture 
10,000 years ago, the scale of human cooperation has steadily increased so that 
most people on earth today are enmeshed in immense cooperative institutions 
like universities, business firms, religious groups, and nation states. Moreover, 
experimental work, both in psychology and economics, indicates that people 
have social preferences that incline them to such cooperation (see Fehr and 
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Fischbacher, 2003, for a review]. In the laboratory, people behave altruisti- 
cally in anonymous one-shot interactions, sometimes for very large stakes. 

Thus, we have an evolutionary puzzle. At some time in the not so distant 
past, say 5 million years ago, our ancestors lived in small kin-based societies 
like other apes. Then, sometime between then and now, human psychology 
changed in such a way that large-scale cooperation became common. What 
were the evolutionary processes that gave rise to this change? 

Ever since we started thinking about cultural evolution, we have thought 
that culture might provide the solution to this puzzle because it seems to 
generate lots of variation in social behavior among social groups. In other 
primate species there is little heritable variation among groups within a species. 
The behavior of groups depends on the habitat and ecology, the demo- 
graphic structure, and the personalities of particular individuals. But these 
differences are small and ephemeral, and, as a consequence, group selection at 
the level of whole primate groups is not an important evolutionary force. In 
contrast, it is an empirical fact that there is much heritable cultural variation 
among human groups. Neighboring groups often have different languages, 
marriage systems, and property rights, and these differences persist for 
generations. This suggested to us that group selection might be a more 
important process shaping human behavior than the behavior of other animals. 
We have devoted quite a bit of our research effort to trying to gain a clearer 
understanding of this puzzle. This work is usefully divided into two parts. 

Studies of cultural group selection. First, we have studied models of cultural 
group selection and attempted to collect empirical data necessary to deter- 
mine whether the models are close to reality. We believe that the case for 
cultural group selection is strong. 

Studies of the evolution of contingent cooperation. Many scholars in the 
evolutionary social science community believe that human cooperation is 
better explained by selection within groups that favored various forms 
of contingent cooperation. The idea is that during most of our evolutionary 
history, humans lived in small groups in which reciprocity and moralistic 
punishment supported cooperation. The psychological machinery that sup- 
ported these behaviors “misfires” in the larger societies of the last 10,000 
years. We have been skeptical about this argument because many other 
mammals live in small social groups, yet none of them shows very much 
evidence of contingent cooperation beyond pairwise reciprocity. It seemed 
to us that the advantages created by wider cooperation within groups like 
specialization, division of labor, risk spreading, and so on are huge, and 
lineages like ants and termites in which kin selection supports cooperation 
have been extremely successful. Thus, it seemed to us that if contingent 
cooperation could generate larger-scale cooperation, there ought to be lots of 
examples in nature. However, when we started thinking about this problem 
in the early 1980s, there was lots of work on the evolutionary theory of 
reciprocity among pairs of individuals, but very little about contingent 
cooperation in larger groups. So we undertook to develop theory in this area, 
and the results are reprinted here. 
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Studies of the Evolution of Contingent Cooperation 

The modern theory of the evolution of reciprocity began in 1971 when 
Robert Trivers showed that contingent cooperation could be evolutionarily 
stable. His model goes roughly as follows: suppose that pairs of individuals 
interact repeatedly over time and that occasionally one member of a pair has 
the opportunity to provide a benefit, b, to the other at a cost, c, to itself. Now 
consider a population of reciprocators who help on the first interaction and 
keep helping as long as their partner helps. Trivers (apparently with help 
from W. D. Hamilton) showed that reciprocators can resist invasion by rare 
defectors who never help as long as the long-run benefit of mutual cooperation 
is greater than the short-run benefit that a defector gets by exploiting a co- 
operator. (Or, more formally, when t(b — c]>b, where t is the average number 
of helping opportunities for each pair of individuals.) This article has been 
widely cited and was the impetus for much empirical work on reciprocity. 

However, there is a big problem with this analysis: when individuals 
interact repeatedly, reciprocity is evolutionarily stable, but so is everything 
else. Unbeknownst to Trivers and most other biologists working on reci- 
procity, game theorists in economics, political science, and mathematics had 
been working on the closely related problem of rational behavior in repeated 
games. As Trivers noted in his article, his model of reciprocity can be for- 
malized as a repeated version of the famous prisoner’s dilemma game. What 
Trivers apparently did not know is that by the late 1950s game theorists 
had proved that in a repeated prisoner’s dilemma (or, in fact, in any repeated 
game in which players can strongly affect each others’ payoffs) any pattern 
of behavior can be sustained by mutual self-interest, all cooperation for sure, 
but also all defection, or anything in between as long as interactions go on long 
enough. This important result was known as the “folk theorem” because 
nobody in the game theory community was exactly sure who first proved it, 
and though the theorem was widely known in that community, it wasn’t 
actually published until 1986 (Fundenburg and Maskin, 1986). The basic logic 
of the folk theorem is simple. Suppose a strategy takes the form: do x, where 
x is some behavior, say alternating cooperate and defect, as long as the other 
guy does x. If the other guy does something else, defect forever. Once a 
strategy like this becomes common in a population, the only smart thing to do 
is %; otherwise, one will be punished by defection for the rest of the interac- 
tion. If interactions go on long enough, the costs of such punishment will 
exceed the short-run benefits of doing something other than x. Repeated 
interactions create the possibility of sanctions and any behavior that enough 
sanctioners are willing to sanction is an equilibrium. For the most part, the 
logic of the folk theorem applies to evolutionary theory, although a subtle and 
important difference affects the stability of punishment. We will return to 
this issue. The bottom line is that when everything is an equilibrium showing 
that reciprocity is an equilibrium too doesn’t really tell you much. We need 
to know which equilibria are likely evolutionary outcomes and which are not. 
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In 1981 Robert Axelrod and W. D. Hamilton published an article in 
Science that showed that reciprocating strategies were, in fact, the most likely 
evolutionary outcome. Standard game theory assumes that people seek to 
maximize their average payoff. In evolutionary terms, this is equivalent to as- 
suming that groups of interacting individuals are formed at random with re- 
spect to genotype. [When individuals interact at random, their actions do not 
change the relative fitness of other types in the population. Thus, all that 
matters is the effect of behavior on an individual's own fitness.] Reciprocators, 
or, more precisely, individuals with genes that cause them to reciprocate, are 
as likely to initially interact with defectors [i.e., individuals with defector 
genes] as are other defectors. This is not a bad assumption for a large, mobile 
mammal like humans, because there is ample gene flow among social groups 
and, to a rough approximation, individuals do interact at random. However, a 
better approximation is to assume a small tendency to interact with genetically 
similar individuals. Reciprocators are slightly more likely to interact with 
other reciprocators than defectors are. Axelrod and Hamilton showed that 
even small amounts of assortative interaction allowed reciprocal strategies to 
invade when rare and stabilized them when common. The reason is easy to see. 
When strategies interact at random, and defection is common, there is no 
chance that individuals carrying rare reciprocating genes will meet. So the long- 
run benefits associated with sustained cooperation are irrelevant. Reciprocators 
get exploited, and that is that. However, when there is some assortative in- 
teraction, rare reciprocators do occasionally meet, and if the long-run benefits 
of cooperation are big enough, even a small amount of assortment can cause the 
average fitness of reciprocators to exceed the average fitness of defectors. To 
see the strength of this effect, suppose that b/c = 2, helping behavior that would 
be favored only among full siblings. The following table calculates the amount 
of assortment necessary to cause reciprocating strategies to increase when rare. 
At even a modest number of interactions, the threshold value is very small. 

In dyads, a little kinship and a little repeat business can generate a lot of 
cooperation. 


Expected number of interactions 1 3 7 15 49 

Threshold value of r .5 .25 .125 .0625 .02 


Axelrod and Hamilton were also concerned that reciprocating strategies 
could do well in more complex social environments in which many different 
strategies were common. They famously championed a particular reciprocating 
strategy, tit-for-tat, showing that it did well in computer tournaments against a 
wide range of strategies. Subsequent research has shown that tit-for-tat is really 
not such a good strategy if individuals make mistakes. Other reciprocating 
strategies such as “contrite tit-for-tat” (Sugden, 1986; Boyd 1989] and 
“Pavlov” [Boerlijst, Nowak, and Sigmund, 1998] are really more robust. 
Nonetheless, their basic conclusion holds true. Given quite plausible 
assumptions, reciprocating strategies can increase when rare, can continue to 
increase under a range of assumptions, and can persist when common. 
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Axelrod and Hamilton’s (1981} article, and most of the work that 
followed it, deals with reciprocity among pairs of individuals. Many authors 
interested in human behavior have assumed that the conclusions of this work 
can be extended to cooperation in larger groups (e.g., Trivers, 1971}. We 
know from everyday experience that groups of people can organize contingent 
cooperation. Committees, sports teams and many similar groups work that 
way. So even though the theory applies to pairs, the general result seems to 
apply to larger groups. Several chapters included here resulted from checking 
to see if the theory of evolution of contingent cooperation applies to larger 
groups. 

In our first effort (chapter 8}, we extended the Axelrod-Hamilton analysis 
to groups of people repeatedly interacting in an re person prisoner’s dilemma. 
During each interaction, individuals can cooperate producing a benefit, b/n, 
for all players including themselves at a cost, c, to themselves. Thus, if ev- 
eryone cooperates, they achieve a long-run payoff, t[b — c). As in the two- 
person case, however, defectors achieve a short-term payoff, now b[n — 1 }/re, 
by free-riding on the cooperative payoffs of others. We consider a family of 
reciprocating strategies that generalize tit-for-tat to larger groups. Namely, the 
strategy T ; cooperates on the first interaction and on subsequent interactions if 
of the re — 1 other individuals cooperated during the previous interaction. 
Thus, T 0 individuals always cooperate; T n _i cooperate only if everyone else 
cooperated on the previous turn. 

The equilibrium behavior of this model is qualitatively similar to the 
two-person case. As always, defection is evolutionarily stable. Contingent 
cooperation can be evolutionarily stable, but only if reciprocating strategies do 
not tolerate defection. A population in which the strategy T n \ is common will 
resist invasion by rare mutant defectors if the long-run benefit of cooperation 
exceeds the short-term advantage of free-riding. However, none of the other 
more tolerant reciprocating strategies can resist invasion by defectors. For 
example, when T„_ 2 , the strategy that tolerates one defector in its group, is 
common, rare defectors will get the long-run benefits of cooperation with- 
out paying the cost and thus will increase in frequency. It turns out that 
strategies like T n 2 that tolerate a few defectors can persist in mixed stable 
equilibria with defectors, but interactions must go on for a very long time. 
Thus, like the two-person case, virtually any kind of behavior can be evolu- 
tionarily stable. 

Our analysis of this model indicates that as groups get bigger, reciprocity 
becomes a much less likely evolutionary outcome. Once again, suppose 
that interacting groups are formed assortatively of relatives with degree of 
relatedness r. Then rare reciprocators using the potentially evolutionarily 
stable strategy T n , can invade if 

(r(w - 1) + 1 j {b/n) - c + r n ~\t - 1 ){b - c} > 0 
inclusive fitness reciprocity 

The first term on the right-hand side gives the inclusive fitness of rare 
reciprocators during the first interaction. If it is positive, cooperation pays 
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even without reciprocity. The second term gives the increase in the fitness 
of reciprocators due to ongoing interactions in those groups in which 
reciprocation is sustained. As in the two-person case, this term increases 
linearly with the average number of interactions (t ) — repeat business makes 
reciprocation pay. However, also notice that the second term decreases 
geometrically with group size because cooperation is sustained only in groups 
of all reciprocators. 

Strategies supporting contingent cooperation in large groups have to 
achieve two competing desiderata. To be stable when common, they must be 
intolerant of defection; to increase when rare, there must be a substantial 
chance that groups will have enough reciprocators; otherwise, they can’t be 
evolutionarily stable, as defectors will prosper. As groups get larger, this 
become geometrically more difficult. 

A number of people have suggested (e.g., Bendor and Mookerjee, 1987) 
that this analysis underestimates the problems facing reciprocity in larger 
groups because contingent cooperation in large groups will be much more 
sensitive to errors than it is in pairs. This claim is true of the particular re- 
ciprocal strategies we analyzed, because a single error would lead to a collapse 
of cooperation in the group. However, we do not think that it is a robust effect 
because the reciprocating strategies in large groups can be modified to deal 
with errors in much the same way that two-person strategies can. For exam- 
ple, the n-person version of Pavlov would use the rule cooperate if everyone or 
no one cooperated on the last turn. Then an error would create universal 
defection, which, on the subsequent interaction, would then generate uni- 
versal cooperation. Strategies analogous to generous tit-for-tat likely could also 
be designed to deal with errors in an n-person setting. 

Colleagues have suggested to us that the n-person prisoner's dilemma is 
an extreme case because it assumes that noncooperators cannot be selectively 
excluded from enjoying the benefits of the cooperative act. For example, 
everybody gets the benefits of group defense whether they fight or not. 
Indeed, economists say that such goods are not “excludable.” Perhaps in many 
instances of cooperation in groups, noncooperators can be excluded. Take 
the classic example of food sharing among hunter gatherers. In most foraging 
groups, successful hunters share their catch with the rest of their group, a 
behavior sometimes explained as a reciprocal arrangement that reduces risk of 
starvation. Couldn’t earnest hunters easily exclude guys who don’t hunt? Just 
don’t give them a share of meat. Don’t we need to consider models in which 
the fruits of cooperation are at least partly excludable? Maybe, but the 
problem is a little trickier than it first appears. 

Excluding defectors is an example of a much more general phenomena. 
To prevent a defector from eating, somebody has to intervene when he 
reaches into the pot. That someone has to undertake a (perhaps) costly action 
that reduces the payoff of the defector and thus produces a benefit to the 
group as a whole. This is an example of what Trivers called “moralistic 
punishment” and applies to a much wider range of problems than excluding 
defectors from the fruits of cooperation. Even if the defectors cannot be 
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excluded, punishment can create incentives for them to cooperate. Cowards 
may get the benefits of group defense, but they may also be shunned, 
beaten, or banished. The real question is under what conditions can selection 
favor moralistic punishment? 

In chapter 9 we attempt to answer this question. The model assumes that 
individuals interact repeatedly in an n-person prisoner’s dilemma. After each 
interaction, members have the opportunity to punish any other member of 
the group at a cost to themselves. We analyzed a variety of strategies, but here 
we begin by focusing on just two of them: moralistic punishers cooperate and 
punish defectors, and reluctant cooperators defect until they are punished, and 
then they cooperate. So that punishment could induce cooperation, we 
assume that the cost of being punished is greater than the cost of cooperating. 
Both types occasionally make mistakes and defect when they mean to 
cooperate. In this simple world, there are three types of stable equilibria. First, 
suppose reluctant cooperators are common in the population. They neither 
cooperate nor punish, so they achieve a payoff of zero. Rare mutant punishers 
will punish the re — 1 reluctant cooperators in their group and thereby induce 
them to cooperate over the long run. If the long-run benefit of being in a co- 
operative group is less than the one-time cost of punishing, reluctant coop- 
erators are an ESS. However, if the long-run benefit is greater than the cost of 
punishing, moralistic punishment can invade even when groups are formed 
at random. The fact that the reluctant cooperators do better than the mor- 
alistic punishers in their group is unimportant when moralistic punishers are 
rare because the vast majority of reluctant cooperators are in groups without a 
punisher. As moralistic punishers increase in frequency, however, more and 
more reluctant cooperators find themselves in groups with a punisher, and as a 
consequence their relative fitness increases. Eventually the fitness of the two 
types equalizes at a stable polymorphic equilibrium at which the population 
is a mix of cooperative and noncooperative groups. At this equilibrium, 
cooperation arises as a consequence of private individual benefit. We jokingly 
referred to this as the “big man” equilibrium after the famous political/ 
economic system common in New Guinea that it resembles. This model also 
has a second, quite different kind of equilibrium. Suppose that moralistic 
punishers are common. Now rare reluctant cooperators are always punished 
by every other member of their group during the first interaction, and as long 
as the cost of this punishment is less than the cost to moralistic punishers 
of punishing the occasional error, then punishment can sustain cooperation. 
However, it can also stabilize almost any other behavior. The long-run benefits 
of cooperation are irrelevant to the stability of this equilibrium. This is the folk 
theorem again. If almost everybody is going to punish individuals for some 
transgression then individuals must do what they want, no matter how foolish 
it is in any other terms. 

We think these two very simple models capture a robust difference in 
contingent cooperation and moralistic punishment. Contingent cooperation 
strategies can be stable only if they insist that everyone in the group cooper- 
ate — otherwise, they can be exploited. However, since such strategies increase 
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when rare with the greatest difficulty, they are not very likely evolutionary 
outcomes. Defecting equilibria are much more likely evolutionary outcomes. 
The directed punishment of moralistic strategies means that a small number of 
punishers can induce others to cooperate and thus achieve the long-run 
benefits of cooperation. If punishment is cheap enough that a single individual 
can induce all other group members to cooperate, then moralistic strategies 
can increase when rare. However, they can never spread to fixation precisely 
because only a few punishers are necessary, and as punishers become common, 
selection favors free riders who accept the benefits but don’t do the police 
work necessary to generate them. We are quite doubtful that this kind of 
equilibrium is common in human groups. As Hobbes pointed out long ago, 
individual men have a similar capacity for inflicting harm. When I push you 
away from the food, you are likely to push back (weapons probably reduce 
differences in fighting ability — God created men, but Sam Colt made them 
equal, frontiersmen quipped). This problem does not afflict moralistic 
equilibria because defectors are rare and punishers are common. However, 
while moralistic punishment is stable, within-group evolutionary processes do 
not make it a likely evolutionary outcome. The fact that directed punishment 
requires only a few punishers is also responsible for the peculiar nature of 
moralistic equilibria. When moralistic punishers are common, mutant non- 
punishers have no effect on whether the group cooperates — all groups will be 
cooperative because there are plenty of punishers everywhere. Thus, while 
such equilibria are stable, individual natural selection has no reason to attach 
such punishment to group-beneficial cooperative behaviors. 

The fact that there are always more than enough punishers at a punisher- 
cooperator equilibrium means that such equilibria can be invaded by “second- 
order free riders,” individuals who cooperate from the first interaction but 
never punish. While much of the debate about moralistic punishment has 
focused on the problem of second-order free riders, we don’t think it is 
a serious obstacle to evolution of cooperation in large groups. First of all, 
“metapunishment” can evolve, the punishment of nonpunishers. As we show 
in chapter 9, this can stabilize punishment. Many people believe that meta- 
punishment doesn’t actually occur in real human societies. However, even if 
this is the case, other solutions to the second-order free rider problem are 
possible. If moralistic punishment is common, and punishments sufficiently 
severe, then cooperating will pay. As a result, most people may go through 
life without having to punish very much. On average, having a predisposition 
to punish may be cheap compared to a disposition to cooperate (in the 
absence of punishment). Thus, relatively weak evolutionary forces can 
maintain a moralistic predisposition. This argument is elaborated in chapter 10 
in which it is shown that very small amounts of conformist social learning can 
stabilize moralistic punishment against second-order free riders, and in 
chapter 13 in which we show that group selection can also stabilize punish- 
ment. Finally, as Eric Smith and his colleagues have pointed out (Smith 
and Bliege Bird, 2000), punishing could be used to signal hard-to-observe 
personal qualities, giving punishers a private reward in the mating game, for 
example. 
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Cultural Group Selection 

When we were graduate students during the late 1960s and early 1970s, it was 
quite common for biology texts to explain observed traits in terms of their 
benefit to the population or even the species. Reduced reproductive rates 
prevented overpopulation, and sexual reproduction maintained genetic 
variation necessary for the species to adapt. A key advance in biology over the 
last 40 years was to show that such explanations are mostly wrong. Natural 
selection does not normally lead to the evolution of traits that are for the good 
of the species, or population. With some interesting exceptions, selection 
favors traits that increase the reproductive success of individuals, or sometimes 
individual genes, and when there is a conflict between what is good for the 
individual and what is good for the species, or population, selection usually 
leads to the evolution of the trait that benefits the individual. 

Many people mistakenly believe that this means that group selection is 
never important. In the early 1970s, an eccentric engineer named George Price 
published two articles (1970, 1972] that presented a genuinely new way to 
think about evolution. Price showed that selection can be thought of as a series 
of nested levels: among genes within an individual, among individuals within 
groups, and among groups. He discovered a very powerful mathematical 
formalism, now called the “Price covariance equation,” for describing these 
processes. To keep things simple, let's suppose that there are two levels. Then 
the change in frequency of a gene undergoing selection is given by 

•'Vf = Vc.lk, ^ YwPw 

The first term gives the change due to selection between groups and is the 
product of the variance in frequency between groups (Vg] and the effect of 
a change in the frequency of the gene on the reproductive success of the group 
{Pol This makes sense: Po gives the effect of a change in gene frequency on 
group success, and Vo measures how different groups are. The second term, 
which gives the change in frequency due to changes within groups, has a similar 
form. It is the average over all groups of the product of the variance in fre- 
quency among individuals within the group [V w ) and the effect of a change 
in the frequency of the gene on the relative fitness of individuals within 
groups {Pwl 

This equation makes it easy to see why selection does not lead to the 
evolution of traits that are beneficial to whole populations if there is any harm 
to individuals. A gene is beneficial to the group when increasing the frequency of 
the gene increases group fitness, or [i G > 0. If it is costly to the individual, then 
[3 W < 0. The magnitude of these two terms depends on the details of the 
particular situation — you can't say anything in general. However, theory tells us 
that when groups are large, with even a small amount of migration among them, 
the variance between individuals (Vvk) will be about n times bigger than the 
variance between groups (Vg; Rogers, 1990]. Unless the group benefit is on the 
order of n times the cost, selection will eliminate the group-beneficial gene. But 
when this is the case, the trait is individually beneficial averaged over all groups. 
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However, this doesn't mean that group selection is unimportant. We have 
just seen that when groups of individuals interact over long periods of time, 
any behavior can be evolutionarily stable within groups. Moreover, multi- 
ple stable equilibria can also arise from the conformist tendency in social 
learning discussed in chapters, 1, 5, and 11. When lots of alternative equilibria 
exist, we need a theory that tells us which equilibrium will be the long-run 
evolutionary outcome — what game theorists call the equilibrium selection 
problem. We argue in several articles that selection among groups favors the 
most group-beneficial equilibrium. To see why this is plausible, consider 
the Price equation, and suppose that there are two inherited traits; both are 
stable within groups when common, but one leads to higher rates of group 
reproduction. This means that, as before, Pc > 0. Because both traits are 
favored by selection when they are common, each trait will be favored in some 
groups, so that the average value of fiw can be either positive or negative. 
However, as long as there is not too much migration, most of the groups will 
be near one equilibrium or the other. So the variance among groups will be 
much larger than the variance within groups, independent of group size. The 
reason for this discrepancy is simple: when traits are individually advanta- 
geous, selection and migration are working together to make all groups the 
same; the only process making groups different is genetic drift, which depends 
strongly on population size. When there are multiple equilibria, selection is 
driving groups toward different alternative stable equilibria, creating lots of 
stable between-group variation. Thus, selection between groups generates 
group-beneficial outcomes. 

While the Price equation makes it easy to understand the logic of selection 
at the group level, it also conceals crucial details about population structure 
and the mode of intergroup competition. Evolutionary geneticists have stud- 
ied a range of population structures ranging from “stepping stone” models 
in which groups exchange migrants with a small number of neighbors to 
“Wright Island” models in which all groups are connected by migration. Such 
models have incorporated two modes of intergroup competition: the group- 
beneficial trait can increase the productivity of the group so that it produces 
more emigrants, called “differential proliferation,” or it can reduce the ex- 
tinction rate of the group, called “differential extinction.” The basic conclu- 
sion of theoretical work on the evolution of altruism is that these details don't 
matter much (e.g., Aoki, 1982; Rogers, 1990). However, when there are 
multiple equilibria, the population structure and modes of group competition 
matter a lot. In Boyd and Richerson (1990), we show that when there are 
multiple equilibria, and within-group adaptive processes (selection or selection- 
like biased cultural transmission) are strong, the equilibrium with the lowest 
extinction rate spreads under a wide range of conditions. Groups can be large 
and migration rates substantial. The main requirement is that habitats emptied 
by extinction are colonized by individuals drawn mostly from a single group. 
Interestingly, make this a differential proliferation model and group selection 
has no effect. The same process that preserves variation between groups 
prevents a steady trickle of immigrants from groups at the group-beneficial 
equilibrium from having much effect on groups at the other equilibrium. 
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Extinction, coupled with recolonization by a single other group, means 
that groups become crude ''individuals” that reproduce their own group 
characteristics. 

We also wanted to know whether intergroup competition will lead to 
change on the right time scales to explain observed rates of cultural evolution. 
Obviously, this depends on how often groups go extinct. So, working with 
Joseph Soltis, we estimated an upper bound on the rate of cultural evolution 
by this kind of intergroup competition using ethnographic data from New 
Guinea societies. This analysis (chapter 11) indicates that intergroup com- 
petition leads to the evolution of group-beneficial cultural traits on 500- to 
1,000-year time scales, too slow to account for much cultural change. On the 
other hand, major change in social institutions is a slow process; witness 
the relatively slow growth in sophistication of complex societies over the past 
5,000 years. The model may apply to conservative aspects of cultural 
change. Much historic and prehistoric cultural change has a time scale of 
a millennium or more. 

Intergroup competition is not the only mechanism that can lead to the 
spread of group-beneficial cultural variants — a propensity to imitate successful 
neighbors can also lead to the spread of group-beneficial variants. Plausibly, 
people often know something about what goes on in neighboring groups. 

Now, suppose that neighboring groups are at different equilibria and that one 
of the equilibria is better, meaning that it makes people in that group better 
off. Then, behaviors could spread from groups at high payoff equilibria to 
neighboring groups at lower payoff equilibria because people imitate their 
more successful neighbors. To see whether this mechanism could actually 
work, we analyzed the model presented in chapter 12, and our results suggest 
that it can lead to the spread of group-beneficial beliefs as long as groups 
are connected to only a small number of neighboring groups (in a stepping 
stone population structure) so that the success of one group can affect 
neighbors enough to cause them to tip from one equilibrium to the other. The 
model also suggests that such spread can be rapid. Roughly speaking, it takes 
about twice as long for a group-beneficial trait to spread from one group to 
another as it does for an individually beneficial trait to spread within a 
group. This process is faster than intergroup competition because it depends 
on the rate at which individuals imitate new strategies, rather than the 
rate at which groups become extinct. 

These models suggest that the evolution of cooperative norms is a side 
effect of rapid, cumulative cultural adaptation. Adaptation by cultural evo- 
lution brings significant benefits, especially in the climatic chaos of the later 
Pleistocene epoch. However, it also generates lots of variation between 
groups; thus, group selection is a much more important force in human 
cultural evolution than it is in genetic evolution. We think the best evidence 
from archaeology suggests that humans first began to rely on cumulative 
cultural adaptations roughly a half million years ago. If this inference is 
correct, humans have been living in social environments shaped by group 
selection for a long time. In chapter 14 (with Joe Henrich), we argue that in 
such social environments, ordinary natural selection will favor psychological 
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mechanisms like empathy, guilt, and shame that make it more likely that in- 
dividuals behave prosocially. The coevolutionary response of our innate social 
instincts to the selection pressures of living in rule-bound, prosocial tribal- 
scale communities substantially reshaped our social psychology. 

In chapter 14 we argue that cultural group selection and moralistic 
punishment are both important to explaining cooperation. Cultural group 
selection will favor groups with high frequencies of moralistic punishment, 
and it helps ensure that moralistic punishment enforces functional norms. 
Moralistic punishment, as we have said, plays a considerable role in main- 
taining between-group variation on which cultural group selection acts. We 
believe that the tilt of the modeling results and of the empirical data distinctly 
favors what we call in this chapter the tribal social instincts hypothesis. At 
minimum, we believe that the case is sufficiently strong to lift the burden of 
proof that group selection hypotheses have labored under. 
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The Evolution of Reciprocity 
in Sizable Groups 


Several lines of evidence suggest that sizable groups of people 
sometimes behave cooperatively, even in the absence of external sanctions 
against noncooperative behavior. For example, in many food foraging groups, 
game is shared among all members of the group regardless of who makes the kill 
(e.g., Kaplan and Hill, 1984; Lee, 1979; Damas, 1971). In many other stateless 
societies, men risk their lives in warfare with other groups (e.g., Meggit, 1977). 
There is also evidence that a great deal of cooperation takes place in contem- 
porary state-level societies without external sanctions. For example, people 
contribute to charity, give blood, and vote — even though the effect of their own 
contributions on the welfare of the group is negligible. The groups benefiting are 
often very large and composed of very distantly related individuals. Perhaps the 
most dramatic examples of cooperation in contemporary societies are under- 
ground movements such as Poland’s Solidarity in which people cooperate to 
achieve a common goal in opposition to all of the power of the modern state (see 
Olson, 1971, 1982, and Hardin, 1982, for further examples.) Because of the an- 
ecdotal nature of these data, it is possible to doubt any particular example. 
However, psychologists and sociologists have also shown that people cooperate 
under carefully controlled laboratory conditions, albeit for smaller stakes. For 
example, Marwell and Ames (1978, 1980) presented individual students with 
two alternative investments: a low return private investment in which profits 
accrued to the individual, and a higher return investment in which returns ac- 
crued to all group members whether they invested or not. Students invested in 
the group-beneficial investment at a much higher rate than that consistent with 
rational self-interest. (See Dawes, 1980, for a review of such experiments.) 
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The fact that people cooperate in sizable groups is puzzling from an evolu- 
tionary viewpoint. According to contemporary evolutionary theory, cooperative 
behavior can evolve only through one of two mechanisms: inclusive fitness 
effects (Hamilton, 1975] or reciprocity (Trivers, 1971]. Inclusive fitness effects 
occur when social groups form so that cooperators are more likely to interact with 
other cooperators than with noncooperators. There has been controversy over 
what processes of group formation suffice to allow cooperation. Some authors 
(e.g., Maynard Smith, 1976] have argued that groups must be comprised of 
genetic relatives for cooperation to be favored. Others (e.g., Wilson, 1980; Wade, 
1978] have argued that other mechanisms suffice. We believe that most authors 
would agree that inclusive fitness effects can give rise to cooperation among 
mammals only in relatively small groups. With the exception of humans, this 
prediction is supported by observations of mammalian social behavior. The rel- 
atively few animal societies that have levels of cooperation similar to those of 
humans are typically composed of close relatives (Wilson, 1975; Jarvis, 1981], 
while cooperation in large groups among humans includes cases where co- 
operators are virtually unrelated. 

Cooperation may also arise through reciprocity when individuals interact 
repeatedly. Several related analyses (Axelrod, 1984; Axelrod and Hamilton, 
1981; Brown, Sanderson, and Michod, 1982; Aoki, 1983; Peck and Feldman, 
1986] suggest that cooperation can arise via reciprocity when pairs of individuals 
interact repeatedly. These results suggest that the evolutionary equilibrium in this 
setting is likely to be a contingent strategy with the general form “cooperate the 
first time you interact with another individual, but continue to cooperate only if 
the other individual also cooperates.” Some authors have conjectured that reci- 
procity can lead to cooperation in larger groups through a similar mechanism 
(Trivers, 1971; Flinn and Alexander, 1982; Alexander, 1985, 1987:93ff], How- 
ever, since there has been no explicit theoretical treatment of the evolution of 
behavior when there are repeated interactions in groups larger than two in- 
dividuals, it is unclear whether this conjecture is correct. 

The goal of this chapter is to clarify this issue by extending existing theory to 
explicitly include repeated interactions in large groups. We begin by reviewing 
the evolutionary models of the evolution of reciprocity. We then present a model 
of the evolution of reciprocal cooperation in sizable groups. An analysis of this 
model suggests that the conditions necessary for the evolution of reciprocity 
become extremely restrictive as group size increases. 


Models of the Evolution of Reciprocal Cooperation 

For the most part, evolutionary models of cooperation have been developed 
by biologists interested in explaining cooperative behavior among nonhuman 
animals. (See Wade, 1978; Uyenoyama and Feldman, 1980; Michod, 1982; 
Wilson, 1980, for reviews]. These assume that individual differences in social 
behavior, including the strategies that govern individual behavior in potentially 
reciprocal social interactions, are affected by heritable genetic differences. They 
further assume that the outcome of potentially cooperative social interactions 



THE EVOLUTION OF RECIPROCIT' 


SIZABLE CROUPS 147 


affects an individual’s reproductive success. Successful behavioral strategies will, 
thus, increase in the population through natural selection. The question then is: 
under what conditions will natural selection favor behavioral strategies that lead 
to cooperation? The answer to this question should illuminate contempo- 
rary human cooperation to the extent that evolved propensities shape human 
behavior. 

If behavioral strategies are transmitted culturally instead of genetically, evo- 
lutionary models also provide insight into the conditions under which coopera- 
tive behavior will arise in contemporary societies. Some authors (Axelrod, 1984; 
Brown et al., 1982, Maynard Smith, 1982; Pulliam, 1982; Boyd and Richerson, 
1982, 1985] have constructed models, formally quite similar to the genetic ones, 
which assume that behavioral strategies are transmitted from one individual to 
another culturally, by teaching, imitation, or some other form of social learning. 
These models assume that the probability that a strategy is transmitted culturally 
is proportional to the average payoff associated with that strategy. There are many 
plausible ways in which this can occur. For example, it may be that people tend to 
imitate wealthy or otherwise successful individuals. (For discussions of the rela- 
tionship between genetic and cultural evolution, see Cavalli-Sforza and Feldman, 
1981; Lumsden and Wilson, 1981; and Boyd and Richerson, 1985]. 

The recent work of several authors (Boorman and Levitt, 1980; Axelrod, 
1980, 1984; Axelrod and Hamilton, 1981; Brown et al., 1982; Aoki, 1983; Peck 
and Feldman, 1986; Boyd and Lorberbaum, 1987] suggests that natural selection 
may favor reciprocity when pairs of individuals interact a sufficiently large number 
of times. These models share many common features. Each assumes a population 
of individuals. Pairs of individuals sampled from this population interact a num- 
ber of times. During each interaction, individuals may either cooperate (C] or 
defect (D] . T able 8 . 1 gives the incremental effect of each interaction on the fitness 
of the members of a pair. This pattern of fitness payoffs defines a single period 
prisoner’s dilemma; it means that cooperative behavior is altruistic in the sense 
that it reduces the fitness of the individual performing the cooperative behavior, 
but increases fitness of the other individual in the pair (Axelrod and Hamilton, 
1981; Boyd, 1988]. By assumption, each individual is characterized by an 


Table 8.i. The incremental effect of interactions c 
of the members of a pair 

)n the fitness 


C 

Player 2 

D 

C 

Player 1 

R, R 

S, T 

D 

T, S 

P, P 


Each player has the choice of two strategies, C for cooperate and D for 
defect. The pairs of entries in the table are the payoffs for players 1 and 2, 
respectively, associated with each combination of strategies. In the case of the 
prisoner’s dilemma it is assumed that T > R> P > S, and 2 R> S + T. 
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inherited strategy that determines how it will behave. Strategies may be fixed 
rules like unconditional defection (“always defect”}, or contingent ones like tit- 
for-tat (“cooperate during the first interaction; subsequently do whatever the 
other individual did last time”]. The pair’s two strategies determine the effect of 
the entire sequence of interactions on each pair member’s fitness. 

This literature produces three main conclusions about the evolution of 
reciprocity: 

1 . Reciprocating strategies, like tit-for-tat, that lead to mutual cooperation 
are successful if pairs of individuals are likely to interact many times. There is 
some dispute about what kinds of reciprocating strategies are most likely to be 
successful, and whether any pure strategy can be evolutionarily stable (Boyd and 
Lorberbaum, 1987; Hirshleifer and Martinez Coll, 1988]. But it seems plausible 
there will be a stable equilibrium at which reciprocators are common whenever 
interactions last long enough. 

2. A population in which unconditional defection is common can resist 
invasion by cooperative strategies under a wide range of conditions. When a 
population is mostly made up of individuals who never cooperate, and in- 
dividuals are paired randomly, rare reciprocators are overwhelmingly likely to be 
paired with unconditional defectors. Reciprocators suffer because of their will- 
ingness to cooperate initially. In many situations, it is plausible that cooperative 
behavior is the derived condition. Thus, to explain the existence of reciprocal 
behavior, we must solve the puzzle of how reciprocating strategies increase 
when rare. 

3. There seems to be a variety of plausible mechanisms that allow recip- 
rocating strategies to increase when rare. Axelrod and Hamilton (1981; Axelrod, 
1984] have shown that a very small degree of assortative group formation, when 
coupled with the possibility of prolonged reciprocity, allows strategies like tit- 
for-tat to invade noncooperative populations. Peck and Feldman (1986] have 
shown that the costs of cooperative behavior can be frequency dependent in such 
a way that cooperation increases when rare. Finally, Boyd and Lorberbaum 
(1987] show that if mutation or phenotypic variation is present, unconditional 
defection can be invaded even when groups are formed at random. 

This theory suggests a robust conclusion: lengthy paired interactions favor 
reciprocity. We have suspected that this conclusion is sensitive to group size, for 
in larger groups, enforcing individuals bear the full cost of punishing defectors 
while the benefit of enforcement flows to the whole group. (See Boyd and 
Richerson, 1985, 228-230, for a simple game-theoretic presentation of this in- 
tuition.} Authors like R. D. Alexander (1985, 1987:93ff], however, have argued 
that reciprocity can lead to cooperation in sizable groups. Thus, we offer an 
explicit investigation of repeated interactions in groups larger than two. 


Model Assumptions 

Our model closely resembles evolutionary models of reciprocity in pairs. Sup- 
pose there is a population of individuals — each characterized by an inherited 
strategy. Groups are formed by sampling n individuals from the population who 
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interact in a repeated n-person prisoner’s dilemma. Each individual’s payoff 
depends on his strategy and the strategies used by the n — 1 other individuals 
in the group. The representation of any strategy in the next generation is a 
monotonically increasing function of the average payoff received by individuals 
playing that strategy during the previous period. (As argued by Brown et al., 
1982, this assumption is consistent with haploid genetic inheritance of strategies 
and some simple forms of cultural transmission.) We then ask which strategies or 
combinations of strategies can persist. 

We use an n-person prisoner’s dilemma to model cooperation among a 
group of individuals (e.g., Schelling, 1978; Taylor, 1976; for alternative for- 
mulations, see Taylor and Ward, 1982; Hirshleifer, 1983). In any time period, 
each individual can choose either to cooperate (C) or to defect (D). An indi- 
vidual’s payoff in a single time period depends on her own behavior and on the 
number of cooperators in the group. Let V[C I i) and V(D I i) be the payoffs to 
individuals choosing cooperation and defection, given that i of the n individuals 
in the group choose cooperation. The n-person prisoner’s dilemma demands that 
these payoffs have the following properties: 

1. In any interaction, each individual is better off choosing D, no matter 
what the other n — 1 individuals in the group choose. Thus: 

V(D\i)>V(C\i + 1), t = 0, . . . , n — 1 (1) 

This assumption formalizes the notion that altruistic behavior is costly to the 
individual. If groups are formed at random, and interact only once, this as- 
sumption guarantees that cooperative behavior cannot evolve (Nunney, 1985). 

2. If an individual switches from defection to cooperation, every other 
member of the group is better off. This requires that: 


V(D\iifm>V(D\i) 

ncii + i)>kccio 1 ' 


( 2 ) 


This assumption formalizes the idea that cooperation benefits other members of 
the group. 

3. The average fitness of individuals in the group increases if one switches 
from defection to cooperation. This requires: 


(i‘+ l)l/(C|t-i 1) + (n-i-l)V(D\i+ 1) 

>iF(C|i) t (n i)V{D\i) (3) 

where i = 0, , n — 1 . This assumption formalizes the idea that the fitness 
benefits to the whole group from cooperative behavior exceed the fitness costs of 
cooperating. 

We are free to choose the units in which payoffs are accounted. We can thus 
specify that V(D 1 0) = 0 and V[C I n) = B, where B is a positive constant. When groups 
consist of only two individuals, these three conditions generate a slighdy stronger form 
of the prisoner’s dilemma than usual. That is, all three require that T> R, P> S, and 
R> (T+ S)/2 > P rather than the two inequalities listed in table 8.1. 

We derive many of our results here assuming that the payoff to each indi- 
vidual in a group during each interaction is a linear function of the number of 



150 HUMAN COOPERATION, RECIPROCITY, GROUP SELECTION 

individuals who cooperated during that interaction. Let the number of indiv- 
iduals choosing C during a particular turn be i. Then, the payoffs to individuals 
choosing C and D are: 

V(C\i)~{B/n)i c 

and (4] 

V[D\i) = {B/n)i 

From the definition of the n-person prisoner’s dilemma, it must be that B > c and 
c > Bln. This model is identical to the linear model of social interactions used in 
most kin selection models. Economists and political scientists have used various 
versions of this model to represent the investment in public goods (Hardin, 
1982], although Hirshleifer (1983] shows that nonlinear payoffs can strongly 
affect the advantages of cooperation. Two polar cases of the linear payoff model 
are of particular interest: the case in which B is constant with respect to n, and 
the case in which B is proportional to n. The first represents situations in which 
the benefits produced by a cooperative act are divided up among group mem- 
bers, so that increasing group size decreases the benefit per individual group 
member. The second case represents situations in which the benefits reaped by 
one individual do not reduce the benefits received by another. 

Groups of n individuals are sampled from the population and interact re- 
peatedly in the n-person prisoner's dilemma just described. The probability that 
a given group interacts more than t times is it/, where w is a constant between 
zero and one. This assumption means that the expected number of interactions 
among the n individuals is 1/(1 —w). Thus, as w increases, so does the number of 
interactions between a group of n individuals. If w « 0, individuals usually in- 
teract only once. If w « 1, then individuals interact many times. 

Each individual is characterized by an inherited “strategy” that specifies 
whether the individual will choose cooperation or defection during any time 
period based on the history of the group up to that point. In this analysis, we 
consider only the following strategies: 

U: always defect. 

T a : cooperate on the first move and then cooperate on each subsequent 
move if a or more of the other n — 1 individuals in the group chose 
cooperation during the previous time period. 

The set of strategies T a is a generalization of tit-for-tat. In the n person case, 
there are n — 1 such contingent strategies ( T a with a=l, . . . , n— 1], one for each 
of the possible rules of the form “cooperate if a or more individuals cooperated 
on the last move.” Taylor (1976] introduced this family of strategies. We begin 
by assuming that populations consist of only two strategies, U and T a , in which a 
takes on some particular value. Later we will consider populations in which 
three or more strategies are present. 

In populations in which only U and T a are present, an individual’s expected 
fitness depends only on his own strategy and on the number of T a individuals 
among the other n — 1 individuals in its social group. To see this, consider the 
expected fitness of a T a individual in a group in which ; other individuals use the 
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strategy T a . The U individuals in such a group always play D. The T a individuals 
always cooperate on the first interaction. They continue to cooperate as long as 
a or more of the other n — 1 individuals cooperated last time. If j > a, the T a 
individuals play C during every interaction. This means that during each time 
period the payoff to T a individuals is V[C I ;' + 1). The effects of social interaction 
on the fitness of any particular individual depends on the number of time periods 
that individual’s group interacts. If j>a, the average payoff to T a individuals, 
over all groups with other cooperators F[T a I;'), is: 


F(T a \j~) — V(C\j + 1)(1 + w + u, 2 + u; 3 + •••) 
_ V[C\j+Y) 


C5) 


If ; < a, the T a individuals cooperate during the first interaction and defect 
thereafter. This means that the payoff to T a individuals is V{C\j+ 1} during the 
first period and V[C 1 0] during any subsequent periods. Thus, 


F(T a \j) = V(C\j + 1) + V[D\n)[w + w 1 + w 3 + 

=ncL + D + ™ 

A similar argument shows that the expected payoff to U individuals in groups in 
which j of the other n — 1 individuals are characterized by the strategy T a is as 
follows: 


ViP If) 


F(u\n= 


V[D\j ) 


wV (D | 0) 


j>a 

j<a 


C7) 


After the episode of social behavior that generates these payoffs, individuals 
in the population reproduce. We assume that individual fitness is the sum of a 
baseline fitness Wo and the payoff resulting from social interaction. We further 
assume that Wq F(T a I j], F(U\f) for all values of;, meaning that selection 
acting on social behavior is weak. The expected fitness of T a averaged over all the 
different kinds of groups, W(T a ), is given by: 

W(T a ) = g mU | T a ){Wo + F[T a \j)} [8) 


The term in braces is the expected fitness of a T a individual in a group with other 
T a individuals. This term is multiplied by the probability that a T a individual finds 
herself in such a group, m(; I T a J, and is summed over all possible groups. Simi- 
larly, the expected fitness of an unconditional defector, W(U), is the following: 


W{U)=Y,fn{j\U)m+F{U]j)} 


(9) 


where m(; I If) is the probability that a U individual finds herself in a group ii 
which there are j T a individuals. 
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If the frequency of T a in the population before social interaction is p, then 
the frequency before social interaction in the next generation, p' , is: 




P'=P + P{ 1 P) 


[W(Ta)-WtU)] 

W 


[ 10 ] 


W =pW{T a ) + [I -p)W(l/) 

To determine the long-run evolutionary outcome, we determine what fre- 
quencies of T a and U represent stable equilibria of the recursion [10). 


Evolution of Reciprocity When Groups Are Formed Randomly 


We begin by assuming that groups form randomly. This assumption means that 
individuals do not interact with genetic relatives, nor are they able to assort 
themselves based on observable phenotypic characteristics. In the special case 
of pairs, theory [reviewed earlier) suggests that strategies leading to reciprocal 
cooperation can evolve as long as individuals interact a large enough number of 
times. We want to know how increasing group size will affect this conclusion. 
We formalize this assumption by specifying that both m(/ 1 T a ) and m(; | U) are 
binomial probability distributions with parameter p, labeled ra[;'). 

According to equation [10), the frequency of T a will increase whenever the 
expected fitness of T a , W{T a ), is greater than the expected fitness of U, W(U] 
[unless the population is at an equilibrium point, in which case there is no 
change). When groups are formed at random, the condition for T a to increase has 
the following particularly simple and instructive form: 


|V(D|;)-nC|j+ l)]m[j) + £ 


[VtD\n-VtC\j+ l)]m[j) 


: | V[C| a + 1) 


V(D\a) 


( 11 ) 


where if the upper bound of the sum is less than the lower bound, the sum is zero 
by convention. This expression says that T a individuals have a fitness advantage 
relative to U individuals only in groups in which a single additional defector will 
cause cooperation to collapse. For T a to be favored by selection, the advantage it 
gains in such groups must be larger than the disadvantage T a suffers in all other 
groups. To see this, consider each of the three terms in this expression. The first 
term represents the sum of the fitness advantages of U individuals in groups in 
which fewer than a of the other n — 1 individuals are reciprocators, weighted by 
the probability that such groups form. In such groups, T a individuals cooperate 
only once, and U individuals do not cooperate at all. The definition of the 
n-person prisoner’s dilemma guarantees that V(C\j+ 1) < V(D \ j] . This term is 
therefore always positive. The second term represents the average fitness ad- 
vantage of unconditional defection in groups in which more than a of the other 
n— 1 individuals are reciprocators. This term is multiplied by 1/[1 — w), the 
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expected number of interactions, because in such groups T a individuals cooperate 
and U individuals defect for as long as the group persists. Again, this term is 
always positive. The right-hand side of the inequality gives the difference be- 
tween the fitness of the two strategies in groups in which exactly a of the other 
n — 1 individuals are reciprocators, multiplied by the probability that such groups 
form. A T a individual in such a group both cooperates and receives the benefits of 
cooperation of a other cooperators, V[C I a + 1), for as long as the group persists. 
Replacing that T a individual with a U individual causes other reciprocators to 
cease cooperating after the first interaction. This term cannot be positive unless 
the fitness of a cooperator in such a group is greater than the fitness of a defector 
in a group of n defectors, that is, V[C I a + 1) >0. Suppose that this condition is 
satisfied. Then, if the expected number of interactions is large enough (i.e., w is 
close enough to one), T a individuals will have an advantage relative to U in- 
dividuals in groups in which a of the other n— 1 individuals are reciprocators. 
For T a to be favored by selection, the advantage that T a individuals gain in such 
groups must exceed the advantage to U individuals in all other groups. 

With this result in mind, consider the equilibrium behavior of this model. 
The frequency of the two strategies in the population will not change when 
p' =p. Values of p that satisfy this condition are equilibrium values, denoted p. 
Since there is no migration or mutation, p = 1 (all T a individuals) and p = 0 (all U 
individuals) are always equilibrium values of equation (10). There may be other 
equilibria at which both U and T a are present in the population. At these 
polymorphic (or “interior”) equilibria, the average fitness of the two strategies 
must be equal. An equilibrium is stable if the population returns to the same 
equilibrium frequency after small perturbations. Stable equilibria are interesting 
because they tell us something about what kinds of strategies, or mixes of 
strategies, can persist in the long run. Unstable equilibria are also interesting 
because they give information about the range of initial conditions that can result 
in various long-run outcomes. Such an analysis yields the following results. 

A population in which U is common can resist invasion by any reciprocating 
strategy, T a . This is true for all values of w. As in the two-person case, a popu- 
lation that is all unconditional defectors can resist invasion by any reciprocal 
strategy we consider. When unconditional defection is very common and groups 
are formed randomly, most groups contain n unconditional defectors. The 
few T a individuals in the population will be in groups in which all other in- 
dividuals are unconditional defectors. These solitary reciprocators cooperate 
once and thereafter defect. The average fitness of unconditional defectors will 
always be higher than the average fitness of any reciprocal strategy, because 
V(DI0)> V(CI1). 

A population in which T n j is common can resist invasion by unconditional 
defection if, and only if, w is sufficiently large. It is the only reciprocal strategy that 
has this property. T n t is the reciprocating strategy that is completely intolerant 
of defection. Individuals using T n _i will cooperate only if every other individual 
cooperated during the previous time period. Strategies that continue to cooper- 
ate despite one or more defections ( T a , 0 < a <n— 1) cannot be evolutionarily 
stable when groups form randomly. When T a is common, the great majority 
of unconditional defectors will be isolated in groups in which the other n — 1 
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individuals are all reciprocators. Unless a = n — 1 , the T a individuals in such 
groups will continue to cooperate despite the defector. Since V(D \ n — 1] > V(C \ ri), 
unconditional defectors will have higher average fitness than reciprocators. 

The parameter w is a measure of the number of times that individuals 
interact in groups. T„_ i is evolutionarily stable only if: 

w>w c =\ — V(C | n)/V{D | n — 1] [12] 

This relationship has a simple interpretation. Consider an individual in a group in 
which all other individuals use the strategy T n , . If this individual defects on every 
turn, his payoff will be V(D\n- 1] in the first time period and V[D 1 0] = 0 
thereafter. If he instead cooperates, his payoff is V(C I n) every period. Because the 
average number of interactions is 1/[1 — w), condition [12] requires that the av- 
erage payoff from choosing cooperation be greater than the average payoff from 
choosing defection — if cooperation is to resist invasion by individuals using U. 
More iterations mean more chance of satisfying this condition, all else being equal. 

Assuming linear payoffs, the domain of attraction of T„_i diminishes rapidly as 
group size increases. If pairs of individuals interact long enough, either uncondi- 
tional defection or T n _i can persist. How likely is it that a population will end up 
at the cooperative equilibrium? One approach to answering this question is to 
determine the domain of attraction of the two equilibria. An equilibrium’s do- 
main of attraction is the set of initial frequencies that begin trajectories ending at 
that equilibrium. The bigger the domain of attraction of an equilibrium, the 
more likely it is, in some sense, that a population will end up there. [Later we 
will consider a second approach to answering this question.] 

We have not been able to determine the domains of attraction for the two 
fixed equilibria in general. We have found them, however, in the special case 
in which the payoffs are linear functions of the number of defectors. Only two 
stable equilibria exist in this special case, p = 0 and p = 1 . There is also a single 
unstable polymorphic equilibrium. The frequency of reciprocators at the inter- 
nal equilibrium, p c , is [Appendix, part 1]: 

Pc = [w[B-c)/{\-w)) C13] 

If the initial frequency is higher than p c , then the population eventually will 
consist of all reciprocating [T n _ i] individuals. If the initial frequency of co- 
operators is less than p c , the population eventually will be comprised of all U 
individuals. 

To interpret equation [13], remember that the expected fitness of the two 
strategies must be equal at any polymorphic equilibrium. The term c— B/n is the 
difference in fitness between U and T n , individuals during the first interaction. 
The term w(B — c]/[l — w) is the fitness advantage of T n \ relative to U when the 
other n — 1 members of the group are reciprocators. The critical frequency of 
T n _ | individuals necessary for selection to favor T n _i thus is simply the ratio of 
the incremental benefit to the incremental cost of defecting during the first 
interaction raised to the 1/n power. Because the incremental benefit increases as 
the expected number of interactions becomes large [i.e., as w-> 1], the threshold 
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frequency of cooperators necessary for cooperation to increase approaches zero 
[i.e., p c — ► 0] . The domain of attraction for the unconditional defection equilib- 
rium thus shrinks toward zero. Raising the ratio to the power 1 In, however, 
means that the threshold frequency of cooperators necessary for cooperators to 
be favored, p c , increases as group size increases. This effect occurs because the 
probability of forming cooperative groups diminishes geometrically as group size 
increases when groups are formed at random. 

Figure 8.1 illustrates the magnitude of this effect by showing the values of p c 
for various parameter combinations. For small groups, cooperators need increase 
to only a small fraction of the population for selection to favor cooperation. For 
even modest-sized groups, however, the cooperative strategy T n _ \ must reach 
substantial frequency before this strategy increases. For large groups, virtually 
the entire population must consist of cooperators before the cooperative strategy 
can increase. 

In populations composed of T a [n - 1 > a > 0} and U, there is a single stable, 
internal equilibrium as long as w is large enough, c< B(a+ l)/n, and payoffs are 
linear. Of the set of reciprocating strategies we have considered, we have found 
that only T„_j can resist invasion by rare unconditional defectors [U]. We also 



Figure 8.1. This figure presents the threshold frequency of T n _i that must be exceeded 
for this strategy to increase (i.e., p c } as a function of group size (») for four values of w: 

0.9 ( ■), 0.99 ( ), 0.999 (■■■■]> ar| d 0.9999 ( ). These values of w correspond to 

10, 100, 1000, and 10000 interactions, on average, between pairs of individuals, (a} The 
incremental benefit to individual due to one cooperator is proportional to group size 
(B = 1.141 ri), (b] the incremental benefit is constant [B = 2). 
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found, however, that T„_i is unlikely to increase when rare. It would be inter- 
esting to know whether there are any stable internal equilibria at which more 
tolerant cooperative strategies ( T a , a<n — 1) and unconditional defection co- 
exist. It seems plausible that the threshold frequency necessary to get such 
strategies started in a population might be lower. 

It turns out that there are two internal equilibria, one stable and the other 
unstable, as long as (see the Appendix, part 2): 




»= Bia+ B l -ir e > 0 


(14) 


w> c ^ B i n 

( c-B/n ) Prob [j<a\ p=pj) + a(B/n ) Prob H = a\p = p d ) 

The frequency of T a at the stable internal equilibrium, p s , is always greater than 
pd, and the frequency of T a at the unstable equilibrium, p u , is less than p d . If the 
initial frequency of T a is less than p u , the population will eventually consist of all 
unconditional defectors. When the initial frequency of T a is greater than p u , the 
frequency of T a eventually will stabilize at p s . When w is less than this critical 
value, the only equilibria are monomorphic for T a or U. 

Numerical determination of the internal stable equilibria suggests that as a de- 
creases (1) the frequency of the strategy T a at the stable internal equilibrium decreases, 
(2) the threshold frequency ofT a necessary for T a to be favored decreases, and (3) the 
threshold value of w necessary for the internal equilibria to exist increases. One can 
determine the frequency of the two strategies at these polymorphic equilibria by 
finding the values of p for which W{U) = W(T a f Figure 8.2 shows the results 
of numerical determinations of these equilibrium values for several combinations 
of parameter values. When a is almost n— 1, reciprocators will allow only a few 
defectors before defecting themselves. In this case, the frequency of the recip- 
rocating strategy, T a , is high, but so is the threshold frequency of T a necessary to 
get cooperation started. Note also that when a is near n— 1, the internal equi- 
librium may be stable even when w is fairly small. As a decreases, the recipro- 
cating strategy tolerates a larger number of defectors. This greater tolerance 
decreases the frequency of the cooperative individuals at the stable equilibrium, 
the threshold frequency of T a necessary to get cooperation started. As a decreases, 
w must be large in order for a stable equilibrium to exist at all. 

Populations at stable equilibria involving two strategies, T a and U, (n — 1 > 
a > 0), can resist invasion by rare individuals using any other reciprocating strategy, 
Tf,, where a ^ b. So far we have limited our analysis to populations in which only 
two strategies are present. This omission might be important. Assuming w is 
sufficiently large, it is relatively easier for cooperation to get started when co- 
operating individuals are quite tolerant. But tolerant strategies can achieve only 
a low frequency at equilibrium. Suppose that such an equilibrium is reached. If 
less tolerant individuals could then invade, the population might reach a new 
equilibrium at which cooperators existed in higher frequency. If this could 
happen repeatedly, then the cooperators might eventually achieve a high fre- 
quency through a sort of ratchet mechanism. 
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Frequency of reciprocators (p) 

Figure 8.2. Plots of the two internal equilibria in populations characterized by two 
strategies, T a and U, for various parameter values for » = 32 and B = 2. Part (a} shows 
how to determine the values of the two internal equilibria for a given value of 1/(1 - w). 
Part (b) shows how these values are affected by changes in the parameter a, the coop- 
eration threshold of reciprocators. 


It turns out, however, that a population at a stable polymorphic equilibrium 
involving U and T a can resist invasion by any other rare reciprocating strategy, Ty. 
For a third strategy to invade, its expected fitness must be greater than the fitness 
of either of the two common strategies that are themselves equal. When the 
invading strategy is sufficiently rare, expected fitness of Ty individuals can be 
calculated assuming that the other n— 1 individuals in their groups are drawn 
from the equilibrium population. It turns out that (see the Appendix, part 3) any 
invading type has lower fitness than the common reciprocating strategy, T a . To 
see this, suppose that b > a, so that the invading strategy is less tolerant of de- 
fection than the reciprocating strategy common at the equilibrium. First, recall 
that T a individuals have higher fitness than unconditional defectors only in groups 
in which there are a other T a individuals. In all other groups, unconditional 
defectors have the advantage. Now consider the fitness of Ty individuals. If there 
are a T a individuals in the group, a Ty individual does almost as poorly as an 
unconditional defector, because her defection causes cooperation to collapse. In 
groups with any other composition, Ty individuals either act and thus suffer like 
T a individuals, or they defect after one interaction — thus beating the T a in- 
dividuals but losing to the unconditional defectors. The strategy Ty therefore can 
neither capture the benefits of long-term cooperation in groups in which there are 
a threshold number of cooperators nor exploit the cooperation of the common 
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reciprocators as effectively as unconditional defection. The Appendix shows that 
a similar logic holds for a > b. 


The Evolution of Reciprocity When Groups Form Assortatively 

Nonrandom interaction plays an important role in Axelrod’s (1984} influential 
view of the evolution of reciprocity. Like most evolutionary analyses of reci- 
procity (but see Peck and Feldman, 1986; Boyd and Lorberbaum, 1987], Ax- 
elrod’s study indicates reciprocating strategies such as tit-for-tat cannot increase 
when rare if individuals interact at random. Axelrod shows, however, that re- 
ciprocal strategies can increase when rare if individuals pair assortatively, meaning 
that individuals using reciprocating strategies are more likely paired with other 
reciprocators than chance alone would dictate. In genetic models, such as- 
sortative social interactions could arise if individuals tend to interact with genetic 
relatives. If w is large, even a very small amount of assortative interaction will 
allow reciprocating strategies to increase. Thus, in the two-person case, there is 
a synergistic relationship between kin selection and reciprocity in which small 
amounts of kin selection greatly facilitate the evolution of cooperation through 
reciprocity. We now consider whether this synergistic relationship changes as 
group size increases. 

Once again suppose that payoffs are linear and that there are only two 
strategies: reciprocators who cooperate as long as a or more others also cooperate, 
T a , and unconditional defection, U. Also suppose that groups are formed so that 
the probability that a T a individual is in a group in with j other T a individuals is: 

m[j | T a ) - * (! ^[(1 P)T (15} 

where p is the frequency of T a in the population before group formation, and r is 
a measure of assortment (e.g., the relatedness coefficient of kin selection theory}. 
The probability that an unconditional defector finds himself in a group in which j 
of the other n - 1 individuals are T a is: 

H;l^ = ( M T 1 )[(l-rM[r+ (1-rHl-p}]"-'- 1 (16} 

This model is meant to capture the general notion of assortative group formation 
in a mathematically tractable form. It is consistent with some genetic models — 
for example, a model in which strategies are inherited as haploid sexual traits and 
group members are half siblings. There are many other plausible modes of group 
formation that will not yield exactly this pattern of group formation — for ex- 
ample, groups of full siblings. Because the contingent strategies we consider 
cause payoffs to be highly nonlinear functions of the number of reciprocators, 
experience with kin selection models (Cavalli-Sforza and Feldman, 1978} sug- 
gests that different patterns of group formation may yield different results. Our 
model nonetheless has generality when used to determine the conditions under 
which a reciprocating strategy can invade a population in which all defection 
is common because many of these alternative models of assortative group 
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formation become approximately equivalent to equations (15) and (16) when 
one strategy is rare. 

With these assumptions, one can show that T a can increase when rare only 


(B/n)[(n — l)r + 1] — W + W n ~ c Mi \T*)>4 (17) 

“inclusive fitness effect” “reciprocity effect” 

As the frequency of reciprocators, p, approaches zero, the probability that a 
reciprocator finds itself in a group with j other reciprocators, m(j\ T a ), becomes 
approximately 


™C/TO«( M i 1 ) r*(l r)" 1 ' (18) 

Selection can favor cooperative behavior when there is assortative social inter- 
action even with no possibility of reciprocity, because cooperators are more 
likely than defectors to benefit from the cooperation. The first term on the left- 
hand side of (17) represents this inclusive fitness effect (Hamilton, 1975). This 
term indicates that even if w is zero, T a can increase as long as the inclusive 
fitness of T a individuals is higher than that of unconditional defectors. In the 
present context, the most interesting cases are ones in which the first term is 
negative, meaning that cooperation could not be favored without reciprocity. 
The second term on the left-hand side of (17) gives the effect of reciprocity 
when reciprocators are rare. The added benefit received by reciprocators in 
groups in which there are more than a reciprocators is the increase in fitness per 
interaction (B(;+ l)/n — c) times the number of additional interactions during 
which reciprocators receive the benefit (u>/( 1 — wj). Reciprocity will aid the 
spread of strategies like T a as long as benefits produced by cooperation in a group 
of a + 1 cooperators exceed the costs (B(a + Y)/n — c > 0). 

There is a striking synergistic relationship between kin selection and reci- 
procity when pairs of individuals interact (Axelrod and Hamilton, 1981). A small 
degree of assortative social interaction, coupled with the possibility of long-term 
reciprocal relationships, can lead to extensive cooperation in situations in which 
neither factor alone would cause cooperation. This synergy diminishes very 
rapidly as group size increases according to (17). When r is small and a is a 
substantial fraction of n — 1 , the reciprocity effect in (1 7) becomes approximately 
proportional to the probability that a of the other n — 1 individuals in the group 
are reciprocators. When r is small and a/[n — 1) » r, this probability diminishes 
very rapidly as n increases. The clearest case is when a = n — 1 . For a given B, c, 
and r, the expected number of interactions after the first must increase as (l/r) n_1 
for the magnitude of the reciprocity effect to remain constant. 

Figure 8.3 illustrates the dramatic nature of this effect. It plots the threshold 
values of 1/(1 - w] necessary for T a to increase when rare as given by expression 
(1 7). We see that assortative group formation may play a significant role in getting 
reciprocal cooperation started when groups are small. For example, for n = 3 and 
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Figure 8.3. This figure presents the threshold values of 1/(1 - w ) that must be 
exceeded if the strategy T a is to increase when rare as a function of group size [n] for 

four values of r: 1/4 ( ■), 1/8 (- - -), 1/16 (••••), and 1/32 (— -). (a) a = n- 1: 

(b) a=(3/4)[n— 1). 


a = 2, even very small amounts of assortment (e.g., r = 1/32) will cause selection 
to favor reciprocity even when w is quite small (e.g. ; individuals interact roughly 
10 times). When groups are larger, however, no amount of assortment will cause 
selection to favor reciprocity unless w reaches extremely high values. Consider 
n=16 and a =15. When r=l/2, cooperation is favored without reciprocity. 
When r= 1/4, individuals must interact roughly 10 million times if reciprocity is 
to be favored. When a < n — 1, the qualitative picture is similar. T a can increase 
when rare under a somewhat wider range of group sizes, but it remains true that 
the reciprocity effect diminishes rapidly as group size increases. 


Conclusion 

Reciprocity is likely to evolve only when reciprocating groups are quite small. 
Previous research based on the repeated two-person prisoner’s dilemma game 
indicates that pairwise reciprocity will often evolve. Here we have modeled social 
interaction within groups of n individuals as a repeated n-person prisoner’s 
dilemma game and asked under what conditions will selection favor strategies 
leading to reciprocal cooperation. In general, increasing the size of interacting 
social groups reduces the likelihood that selection will favor reciprocating strat- 
egies. For quite small groups, the results parallel the two-person case. For larger 
groups, however, the conditions under which reciprocity can evolve become 
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extremely restrictive. This result satisfies the natural historian’s conventional 
wisdom: large, cooperative groups composed of distantly related individuals are 
unusual in nature. But it leaves human cooperation unexplained. 

Reciprocal strategies must satisfy two competing desiderata to succeed. First, 
to persist when common, they must prevent too many defectors in the popula- 
tion from receiving the benefits of long-term cooperation. The threshold number 
of cooperators thus must be a substantial fraction of group size. Second, to in- 
crease when rare, there must be a substantial probability that the groups with the 
threshold number of cooperators will form. This problem is not great when pairs 
of individuals interact; a relatively small degree of assortative group formation 
will allow reciprocating strategies to increase. As groups becomes larger, how- 
ever, this desideratum can be satisfied only if the threshold number of cooperators 
is fairly small, or the degree of assortment in the formation of groups is large. 

Our model omits many features that may be important in potentially co- 
operative social interactions. We suspect that three of the most important 
missing features are as follows: 

1 . No internal sanctions. We precluded the possibility that individuals could 
directly punish defectors. A cooperator in the n-person prisoner’s dilemma can 
punish a defector only by withholding future cooperation — which also punishes 
other cooperators. Cooperation might flourish under a wider range of conditions 
if cooperators could focus punishment on defectors alone. 

2. No internal structure. Our groups have no internal structure. Cooperation 
might arise in larger groups if individuals interact in some kind of network or 
hierarchy. 

3. Oversimplified game structure. Much of our analysis presumed linear 
payoffs. Several authors have argued that other games may be equally important 
for our understanding of cooperation. Hirshleifer (1983} has shown that the 
nature of the payoff schedule as a function of number of cooperators has im- 
portant effects on motivation to cooperate. Kelley and Thibaut (1978) discuss a 
large array of mixed-motive games that characterize various social interactions, 
and Taylor and Ward (1982) argue that the n- person version of the game 
“chicken” is essential to understanding cooperation. It may be that the prisoner’s 
dilemma with linear payoffs is particularly demanding for the evolution of co- 
operation and that other models would allow the evolution of cooperation in 
sizable groups under a wider range of conditions. 

Omitting these features certainly argues for caution in interpreting our 
results. But including these features would not necessarily allow reciprocity to 
evolve in large groups. It is especially unclear what peculiarities of the human 
case allow us to violate the generalization to which both theory and the natural 
history of nonhuman animals point: the evolution of large cooperative societies 
normally depends more on kin selection than reciprocity. Elsewhere we argue 
that cultural analogs of kin and group selection are indeed promising mechan- 
isms to explain human cooperation (Boyd and Richerson, 1982, 1985, chs. 7 and 
8). Campbell (1983) hypothesizes that effects like those we have modeled here 
suffice to explain the scale of cooperation observed in simpler human societies, 
but not in the state-level societies of the last 5,000 years. The range of plausi- 
ble arguments is still quite broad. But the sharp decline in the tendency of 



162 human cooperation, reciprocity, croup selection 


reciprocity to evolve as a function of group size, and the apparent rarity of 
cooperation in large groups of nonkin in nature, commands attention. At the 
very least, our results suggest we should view with substantial skepticism and 
subject to more searching analysis explanations of human cooperation based on 
reciprocity. 


APPENDIX 


1 . With linear payoffs, and w large enough, there is a single, unstable internal 
equilibrium at which the frequency of T n _\ is given by equation (13). At any interior 
equilibrium, W(T a ) = W[U). With linear payoffs, this requires that 


[B/n - c) 1 - w + w{B{a + l)/n - c)m(«) = 0 


(Al) 


If w is large enough that (12) is satisfied, and a = n 1 , this equation can be satisfied 
only for one value of p, that given in (13). Since both of the boundary equilibria are 
stable when (12) is satisfied, the interior equilibrium is unstable. 

2. If n— 1 > a > 0, payoffs are linear, and both conditions in (14) are satisfied, 
then there are two interior equilibria p=p u and p = p s such that p u < pd < p s . p = p u is 
unstable, and p=p s is stable. Equation (Al) can be rewritten as follows: 

h{p) = w[c - B/n)(l - l p (a, n - a)) + wB[a/n)m{a) = c-B/n (A2) 

where I p {%, y) is the incomplete beta function. First, notice that c — B/n>h(l) = 
w(c — B/n ) > h(0). Next, differentiating h(p] with respect to p (A2) yields this: 

f p Hp) = w(n - 1 - a)m(a - 1 )[B(a + 1 )/» - c - p[B - c)] (A3) 

If B(a+ 1 )/n — c< 0, h{p) is monotonically decreasing, and therefore there are no 
values of p in the interval (0,1) that satisfy (A2), thus no interior equilibria exist. If 
B[a+ l)/n — c> 0, h{p) is unimodal with a maximum at p=pd, where pj has the 
value given in (14). Thus, if h{pj) > c — B/n, there are two values of p that satisfy 
(Al), and if h(pd) < c — B/n, there are none. Clearly for small enough w, h(p ( [] < 
c—B/n, and thus there are no interior equilibria. Similarly, since h[pd) > w[c — B/n), 
for w close enough to one, h{p,i) > c — B/n, and there are two interior equilibria. 
Further, since fi(p^) is a linear function of w, there is some value of w, w^, such that 
there are no interior equilibria for 0 <p <pd, and there are two interior equilibria for 
Pd<P< 1. By solving (Al) for w and setting p=pd, one obtains the expression for w,i 
given in the text. 

From (10), the derivative of p' with respect to p evaluated at an interior equi- 
librium point, L, is the following: 


a[w/(l-w))m{a\p=p)[B[a+l)/n-c-p(_B-c)\ 
W 0 + F(U\p=p ) 


(A4) 


Thus, if p < pd, L > 1, and the equilibrium is unstable. If p > pd, L < 1 . As long as W 0 
is large enough, L > — 1, and thus the equilibrium is stable. 

3. Populations at stable equilibria involving two strategies, T a and U (n— 1 > 
a > 0), can resist invasion by rare individuals using any other reciprocating strategy, 
Ty where a # b. 
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When Ty is sufficiently rare, we can ignore the probability that groups with more 
than one T b individual will occur. This means that the fitness of Ty individuals will 
depend only on the number of T a individuals in their group. First, suppose that 
a>b. Then for j > a, or b > j, F(T a \ j ) = F(T b | j). For a>j>b, F(T a | ;) = B{j + 1 )/ 
n — c, while F(T b \j) = B(j+ V)/n — c— w(c — B/ri) <F(T a \f) by definition. Thus, in 
this case, the expected fitness of the invading type is lower than that of the common 
reciprocator. Next, suppose that a<b. Then for; > b, or a > F(T a | = F(Ty | For 
b> j> a, F(T a | j) = [B(j + l)/w — c]/(l — w), while F(T b \ j) = [B(; + 1 )/«■ - c] + 
w[Bj/n]/{] —w)> F{T a | j) for values of w consistent with the existence of an inte- 
rior equilibrium. For j=a, F(T a \ j) = [B(;+ l)/n — c]/(l — w), while F(Ty | j) = 
[B{j + l)/n — c] + wBj/n < F(T a \ j) for values of w consistent with the existence of an 
interior equilibrium. Thus, 


mm - W(Ta) = wm[d) B[a/n ) - B[a ± ^ C ] + ^ 

By using (1 1) to eliminate terms containing m(a), (A 5) becomes: 

W[T b ) - W(T a ) m[j)(B/n - c)/(l - w) + w^mUMn- c), 

j=b j = 0 

which is always less than zero. 


(A5) 


NOTE 

We thank Joan Silk and John Wiley for extremely useful comments on previous 
draffs of this chapter. 
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Punishment Allows the 
Evolution of 

Cooperation (or Anything Else) 
in Sizable Groups 


Human behavior is unique in that cooperation and division of labor 
occur in societies composed of large numbers of unrelated individuals. In other 
eusocial species, such as social insects, societies are made up of close genetic 
relatives. According to contemporary evolutionary theory, cooperative behavior 
can be favored by selection only when social groups are formed so that co- 
operators are more likely to interact with other cooperators than with non- 
cooperators (Hamilton, 1975; Brown, Sanderson, and Michod, 1982; Nunney, 
1985]. It is widely agreed that kinship is the most likely source of such non- 
random social interaction. Human society is thus an unusual and interesting 
special case of the evolution of cooperation. 

A number of authors have suggested that human eusociality is based on 
reciprocity (Trivers, 1971; Wilson, 1975; Alexander, 1987], supported by our 
more sophisticated mental skills to keep track of a large social system. It seems 
unlikely, however, that natural selection will favor reciprocal cooperation in 
sizable groups. An extensive literature (reviewed by Axelrod and Dion, 1989; 
also see Hirshleifer and Martinez-Coll, 1988; Boyd, 1988; Boyd and Richerson, 
1989] suggests that cooperation can arise via reciprocity when pairs of in- 
dividuals interact repeatedly. These results indicate that the evolutionary equi- 
librium in this setting is likely to be a contingent strategy with the general form: 
“cooperate the first time you interact with another individual, but continue to 
cooperate only if the other individual also cooperates.” Several recent articles 
(Joshi, 1987; Bendor and Mookherjee, 1987; Boyd and Richerson, 1988, 1989] 
present models in which larger groups of individuals interact repeatedly in po- 
tentially cooperative situations. These analyses suggest that the conditions under 
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which reciprocity can evolve become extremely restrictive as group size in- 
creases above a handful of individuals. 

In most existing models, reciprocators retaliate against noncooperators by 
withholding future cooperation. In many situations other forms of retaliation are 
possible. Noncooperators could be physically attacked, be made the targets of 
gossip, or denied access to territories or mates. We will refer to such alternative 
forms of punishment as retribution. It seems possible that selection may favor 
cooperation enforced by retribution even in sizable groups of unrelated in- 
dividuals because, unlike withholding reciprocity, retribution can be made only 
against noncooperators, and because the magnitude of the penalty imposed on 
noncooperators is not limited by an individual’s effect on the outcome of coop- 
erative behavior. 

Here, we extend the theory of the evolution of cooperation to include the 
possibility of retribution. We review the evolutionary models of the evolution of 
reciprocity in sizable groups and present a model of the evolution of cooperation 
enforced by retribution. An analysis of this model suggests that retribution can 
lead to the evolution of cooperation in two qualitatively different ways. 

1 . If the long-run benefits of cooperation to a punishing individual are 
greater than the costs to that single individual of coercing all other 
individuals in a group to cooperate, then strategies that cooperate 
and punish noncooperators, strategies that cooperate only if pun- 
ished, and, sometimes, strategies that cooperate but do not punish 
coexist at a stable equilibrium or stable oscillations. 

2. If the costs of being punished are large enough, “moralistic” strat- 
egies that cooperate, punish noncooperators, and punish those who 
do not punish noncooperators can be evolutionarily stable. 

We also show, however, that moralistic strategies can cause any individually 
costly behavior to be evolutionarily stable, whether or not it creates a group 
benefit. Once enough individuals are prepared to punish any behavior, even the 
most absurd, and to punish those who do not punish, then everyone is best off 
conforming to the norm. Moralistic strategies are a potential mechanism for 
stabilizing a wide range of behaviors. 


Models of the Evolution of Reciprocity 

Models of the evolution of reciprocity among pairs of individuals share many 
common features. Each assumes that there is a population of individuals. Pairs of 
individuals are sampled from this population and interact a number of times. 
During each interaction individuals may either cooperate (C) or defect (D}. The 
incremental fitness effects of each behavior define a single period prisoner’s di- 
lemma, and, therefore, cooperative behavior is altruistic in the sense that it reduces 
the fitness of the individual performing the cooperative behavior but increases 
fitness of the other individual in the pair (Axelrod and Hamilton, 1981; Boyd, 
1988], Each individual is characterized by an inherited strategy that determines 
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how he will behave. Strategies may be fixed rules like unconditional defection 
(“always defect”} or contingent ones like tit-for-tat (“cooperate during the first 
interaction; subsequently do whatever the other individual did last time”}. The 
pair’s two strategies determine the effect of the entire sequence of interactions on 
each pair member’s fitness. An individual’s contribution to the next generation is 
proportional to his fitness. 

Analysis of such models suggests that lengthy interactions between pairs of 
individuals are likely to lead to the evolution of reciprocity. Reciprocating 
strategies, like tit-for-tat, leading to mutual cooperation, are successful if pairs of 
individuals are likely to interact many times. A population in which uncondi- 
tional defection is common can resist invasion by cooperative strategies under a 
wide range of conditions. However, there seem to be a variety of plausible 
mechanisms that allow reciprocating strategies to increase when rare. Axelrod 
and Hamilton (1981} and Axelrod (1984} have shown that a very small degree of 
assortative group formation, when coupled with the possibility of prolonged 
reciprocity, allows strategies like tit-for-tat to invade noncooperative popula- 
tions. Other mechanisms have been suggested by Peck and Feldman (1985}, 
Boyd and Lorberbaum (1987}, and Feldman and Thomas (1987}. 

Recent work suggests that these conclusions do not apply to larger groups. 
Joshi (1987} and Boyd and Richerson (1988} have independently analyzed a 
model in which n individuals are sampled from a larger population and then 
interact repeatedly in an n-person prisoner’s dilemma. In this model, cooperation 
is costly to the individual, but beneficial to the group as a whole. This work 
suggests that increasing the size of interacting social groups reduces the likeli- 
hood that selection will favor reciprocating strategies. As in the two individual 
cases, if groups persist long enough, both reciprocal and noncooperative behavior 
are favored by selection when they are common. For large groups, however, the 
conditions under which reciprocity can increase when rare become extremely 
restrictive. Bendor and Mookherjee (1987} show that when errors occur, re- 
ciprocal cooperation may not be favored in large groups even if they persist 
forever. Boyd and Richerson (1989} derived qualitatively similar results in which 
groups were structured into simple networks of cooperation. 

Intuitively, increasing group size places reciprocating strategies on the horns 
of a dilemma. To persist when common, they must prevent too many defectors in 
the population from receiving the benefits of long-term cooperation. Thus, re- 
ciprocators must be provoked to defect by the presence of even a few defectors. 
To increase when rare, there must be a substantial probability that the groups 
with less than this number of defectors will form. This problem is not great when 
pairs of individuals interact; a relatively small degree of assortative group for- 
mation will allow reciprocating strategies to increase. As groups become larger, 
however, both of these requirements can be satisfied only if the degree of as- 
sortment in the formation of groups is extreme. 

This result should be interpreted with caution. Modeling social interaction as 
an n-person prisoner’s dilemma means that the only way a reciprocator can punish 
a defector is by withholding future cooperation. There are two reasons to suppose 
that cooperation might be more likely to evolve if cooperators could retaliate 
in some other way. First, in the n- person prisoner’s dilemma, a reciprocator 
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who defects in order to punish defectors induces other reciprocators to defect. 
These defections induce still more defections. More discriminating retribution 
would allow defectors to be penalized without generating a cascade of defection. 
Second, in the n-person prisoner’s dilemma the severity of the sanction is limited 
by an individual’s effect on the whole group, which becomes diluted as group 
size increases. Other sorts of sanctions might be much more costly to defec- 
tors and therefore allow rare cooperators to induce others to cooperate in large 
groups. 

There is also a problem with retribution. Why should individuals punish? If 
being punished is sufficiently costly, it will pay to cooperate. However, by as- 
sumption, the benefits of cooperation flow to the group as a whole. Thus, as long 
as administering punishment is costly, retribution is an altruistic act. Punishment 
is beneficial to the group but costly to the individual, and selection should fa- 
vor individuals who cooperate but do not punish. This problem is sometimes 
referred to as the problem of “second-order” cooperation (Oliver, 1980; 
Yamagishi, 1986], 

A recent article by Axelrod (1986] illustrates the problem of second-order 
cooperation. Axelrod analyzes a model in which groups of individuals interact 
for two periods. During the first period individuals may cooperate or defect in an 
n-person prisoner’s dilemma, and in the second, individuals who cooperated on 
the first move have the opportunity to punish those individuals who did not 
cooperate at some cost to themselves. Axelrod shows that punishment may 
expand the range of conditions under which cooperation could evolve. However, 
the strategy of cooperating but not punishing was precluded. As Axelrod notes, 
such second-order defecting strategies would always do better because second- 
order punishment of nonpunishers is not possible. 

The problem of second-order cooperation has been partly solved by 
Hirshleifer and Rasmusen (1989]. They consider a game theoretic model in 
which a two-stage game consisting of a cooperation stage followed by a punish- 
ment stage is repeated a number of times. They show that if punishment is 
costless, then the strategy of cooperating, punishing noncooperators, and pun- 
ishing nonpunishers is what game theorists call a “perfect equilibrium.” (The 
perfect equilibrium is a generalization of the Nash equilibrium that is useful in 
repeated games. See Rasmusen, 1989, for an excellent introductory discussion of 
game theoretic equilibrium concepts.] Because it is a game theoretic model, it 
does not provide information about the evolutionary dynamics. It also seems 
possible that if the model were extended to an infinite number of periods, a 
similar strategy would be evolutionarily stable even if punishment is costly. 

Here we consider evolutionary properties of an infinite period model of 
cooperation with the possibility of punishment that is similar to Hirshleifer and 
Rasmusen’s. We will perform the analysis in three stages. First, we describe the 
basic structure of the model. Then, we consider populations in which there are 
cooperators who punish defection and a variety of strategies that initially defect 
and then respond to punishment in different ways. The goal is to investigate the 
evolutionary dynamics introduced by retribution without the complications in- 
troduced by second-order defection and second-order punishment. Finally, we 
consider the effects of these complications. 
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Description of the Model 

Suppose that groups of size n are sampled from a large population and interact 
repeatedly. The probability that the group persists from one interaction to the 
next is w, and thus the probability that it persists for t or more interactions is 
w l 1 . Each interaction consists of two stages, a cooperation stage followed by a 
punishment stage. During the cooperation stage an individual can either coop- 
erate (C) or defect (D). The incremental effect of a single cooperation stage on 
the fitness of an individual depends on that individual’s behavior and the be- 
havior of other members of the group as follows: let the number of other in- 
dividuals choosing C during a particular turn be i. Then the payoffs to individuals 
choosing C and D are: 


V{C\i)-{b/n)[i 1 1)- c 

(1) 

V{D\i) = {b/n)i 

(2) 


where b > c and c > bln. Increasing the number of cooperators increases the 
payoff for every individual in the group, but each cooperator would be better off 
switching to defection. (This special case of the n-person prisoner’s dilemma has 
been used in economics and political science to represent provision of public 
goods [Hardin, 1982]. It is also identical to the linear model of social interactions 
used in most kin selection models.) During the punishment stage any individual 
can punish any other individual. Punishing another individual lowers the fitness 
of the punisher an amount k and the fitness of the individual being punished an 
amount p. 

Each individual is characterized by an inherited “strategy” that specifies how 
she will behave during any time period based on the history of her own behavior 
and the behavior of other members of the group up to that point. The strategy 
specifies whether the individual will choose cooperation or defection during the 
cooperation stage and which other individuals, if any, she will punish during the 
punishment stage. Strategies can be unconditional rules like the asocial rule “never 
cooperate/never punish.” They can also be contingent rules like “always cooperate/ 
punish all individuals who didn’t cooperate during the cooperation stage.” 

We assume that individuals sometimes make errors. In particular, we sup- 
pose that any time an individual’s strategy calls for cooperation, there is a 
probability e > 0 that the individual will instead defect “by mistake.” This is the 
only form of error we investigate. Individuals who mean to defect always defect, 
and individuals always either punish or do not punish according to the dictates of 
their strategy. 

Groups are formed according to the following rule: the conditional proba- 
bility that any other randomly chosen individual in a group has a given strategy 
S h given that the focal individual also has S ; , is given by: 

Pr(S;|S;) = r + (1 — r)qi (3) 

where q t is the frequency of the strategy S ; in the population before social 
interaction, and 0 < r < 1 . The conditional probability that any other randomly 
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chosen individual in a group has some other strategy Sj, given that the focal 
individual has S i; is given by: 


Pr(S;|S,J — (1 • r)q, (4) 

When r = 0, social interaction occurs at random. When r > 0, social interaction is 
assortative. There is a chance r of drawing an individual with the same strategy as 
the focal individual and a chance 1 — r of picking an individual at random from 
the population (who will also be identical to the focal individual with probability 
equal to the frequency of the focal individual’s strategy in the population). If 
strategies are inherited as haploid sexual traits, r is just the coefficient of relat- 
edness. For other genetic models, r is not exactly equal to the coefficient of 
relatedness. However, it is a good approximation for rare strategies and thus is 
useful for determining the conditions under which a rare reciprocating strategy 
can invade a population in which all defection is common. 

After all social interactions are completed, individuals in the population 
reproduce. The probability of reproduction is determined by the results of social 
behavior. Thus, the frequency of a particular strategy, S i; in the next generation, 
q is given by: 


, qmSi) 
q> Dj/TOf 


C5) 


where W(S,) is the average payoff of individuals using strategy S, in all groups 
weighted by the probability that different types of groups occur. (As argued by 
Brown et ah, 1982, this assumption is consistent with haploid genetic inheri- 
tance of strategies and some simple forms of cultural transmission.) We then ask, 
which strategies or combinations of strategies can persist? 


Results 

No Second-Order Defection 

First, we analyze the evolutionary dynamics of retribution with second-order 
defection excluded. To do this, we consider a world in which only the following 
two strategies are possible. 

Cooperator-punishers (P). During each interaction (1) cooperate, 
and (2) punish all individuals who did not cooperate during the coop- 
eration stage. 

Reluctant cooperators (R\). Defect until punished once, then coop- 
erate forever. Never punish. 

We temporarily exclude strategies that cooperate but do not punish to 
eliminate the possibility of second-order defection. We also exclude strategies 
that continue to defect after one act of punishment. This latter assumption is not 
harmless. We show in the Appendix that if R\ is replaced by unconditional 
defection, then (1) cooperation is much less likely to evolve, and (2) R\ may not 
be able to invade a population in which unconditional defection is common. This 
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analysis is justified for two reasons: first, it provides a best case for the evolution 
of cooperation, and, second, there is abundant empirical evidence that organisms 
do respond to punishment. 

When groups are formed at random (r = 0), such a population can persist at 
one of three stable equilibria (or ESSs]: 

• All individuals are R\ — no one cooperates. 

• All individuals are P — everyone cooperates. 

• Most individuals are R 1; but a minority are P — most are induced to 
cooperate by the punishing few. 

In what follows we describe and interpret the conditions under which each of 
these ESSs can exist. Proofs are given in the Appendix. 

Reluctant cooperators resist invasion by the cooperating, punishing strategy 
whenever the cost to a cooperator-punisher of cooperating and punishing n — 1 re- 
luctant cooperators exceeds the benefit to that punisher that results from the coop- 
eration that is induced by his punishment. It can be shown that the responsive 
defecting strategy R] can be invaded by the cooperating, punishing strategy P as long as: 

k(n - 1] + [c - b/n) < ^ («# ~ c) - eK ”~ g (6] 

initial cost of cooper- long-run benefit induced 

ating and punishing by punishing 

When cooperator-punishers are rare, and groups are formed at random, virtually 
all cooperator-punishers will find themselves in a group in which the other re — 1 
individuals are defectors. The left-hand side of (6) gives the fitness loss associated 
with cooperating, and then punishing re — 1 defectors during the first interaction. 
The right-hand side of (6) gives the long-term net fitness benefit of the coop- 
eration that results from punishment. The term w[b — c]/(l — w) is the long-term 
fitness benefit from the induced cooperation by R \ individuals, and the term 
proportional to e is the long-run cost that results from having to punish erro- 
neous defections. Thus, if this term is positive, P can invade if w is large enough. 

If the cooperator-punisher strategy, P, can increase when rare, punishing is 
not altruistic. Retribution induces cooperation that creates benefits sufficient to 
compensate for its cost. The longer groups persist, the larger the benefit asso- 
ciated with cooperation. Thus, as long as error rates are low or the benefits of 
cooperation are large, longer interactions will permit cooperative strategies to 
invade, even if groups are formed at random. Also notice that the condition for 
Ri to be invaded does not depend on p, the cost of being punished. As one would 
expect, increasing the group size or the error rate makes it harder for the co- 
operative strategy to invade. 

The cooperating-punishing strategy, P, is evolutionarily stable as long as 


p[n- 1 )>c-b/n + 


eHn-V 

(1 - «;)(! - e) 


cost of being punished cost of cooperating 
and punishing 


( 7 ) 
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The first term on the right-hand side of (7) gives the cost of cooperating during 
one interaction; the term on the left-hand side is the cost of being punished by 
n — 1 other individuals, and the second term on the right-hand side is the cost 
of punishing mistakes over the long run. The rare R\ individual suffers the cost 
of punishment but avoids the cost of cooperating on the first turn and the cost of 
punishing erroneous defection over the long run. Notice that this condition is 
independent of the long-run expected benefit associated with cooperation (be- 
cause it does not contain terms of the form b/{\ — wj). It depends only on the 
cost of the cooperation to the individual and the costs of punishing and being 
punished. Thus, retribution can stabilize cooperation, but this stability does not 
result from the mutual benefits of cooperation. 

There is a stable internal equilibrium at which both P and R\ are present 
whenever [1] neither Ri nor P are ESSs, or (2) R\ is not an ESS but P is, and the 
condition [A14] given in the Appendix is satisfied. We have not been able to 
derive an expression for the frequency of P at the internal equilibrium. Figure 9.1 
shows the frequency of P at this equilibrium determined numerically as a 
function of the expected number of interactions (log(l/(l — if]]] for various 
group sizes. When groups persist for only a few interactions, both P and R\ are 
ESSs. Increasing the number of interactions eventually destabilizes R \ and allows 
a stable internal equilibrium to exist. Further increases in the expected number 
of interactions destabilize P, leaving the internal equilibrium as the only stable 
equilibrium. 

Without second-order defection, cooperation can persist at two qualitatively 
different equilibria: either cooperative strategies coexist with noncooperative 
strategies at a polymorphic equilibrium, or all individuals in the population are co- 
operative. When the cooperator-punisher strategy is very rare, it will increase 
whenever the benefit from long-run cooperation to an individual punisher ex- 
ceeds the cost of the punishment necessary to induce reluctant cooperators to 
cooperate. As cooperator-punishers become more common, more reluctant co- 
operators find themselves in groups with at least one cooperator-punisher, and 
thus they enjoy the benefits of long-run cooperation without bearing the costs 
associated with punishing. Thus the relative fitness of cooperator-punishers de- 
clines. As cooperator-punishers become still more common, reluctant coop- 
erators are punished more harshly during the initial interaction and their relative 
fitness declines. 

Assortative group formation has both positive and negative effects on the 
conditions under which cooperator-punishers evolve. When there is assortative 
group formation, individuals are more likely to find themselves in groups with 
others like themselves than chance alone would dictate. Such assortment de- 
creases the cost of cooperating and punishing because cooperators are more 
likely to receive the benefits that result from the cooperative acts of others than 
are noncooperators and because cooperator-punishers need to punish fewer 
noncooperators on the first interaction. However, assortment decreases the long- 
run benefit associated with punishment because cooperator-punishers are more 
likely to be punished for erroneous defection. (Assortment increases the amount 
of punishment that an inadvertently defecting cooperator-punisher receives.] 
The second effect becomes more pronounced the longer groups last because 
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Frequency of P 

Figure 9-1. The equilibrium frequency of P for a given expected number of interactions 
for different group sizes (re = 8, 16, 32) assuming that e = 0.001. For these parameter 
values populations consisting of all P are always at a stable equilibrium. Populations 
without P individuals are also always an equilibrium, but it may be either stable or 
unstable. To find the polymorphic equibbria, pick a number of expected interactions 
and group size, and then determine the frequencies of P at which the horizontal line 
at that the value of log(l/(l — w)) intersects the curve at that value of re. If the horizontal 
line lies below the curve for some qp, then the frequency of P increases; if it lies above 
the curve, the frequency of P decreases. Thus, if there is only one polymorphic equilib- 
rium (e.g., n = 4, log(l/[l - w)) = 1), it is unstable and qp = 0 is stable. If there are two 
polymorphic equilibria (e.g., re =16, log(l/l —wj) = 3), the polymorphic equilibrium 
with the lower frequency of P is stable, and the other polymorphic equilibrium and 
qp= 0 are both unstable. Finally, if there is no polymorphic equilibrium (e.g., re = 8, 
log(l/(l - w)] = 4), the only stable equilibrium is q P = 1. 


cooperator-punishers will make more errors. The negative effect will predomi- 
nate whenever the following condition is satisfied: 

(!-*)( b/n + ky^^ (8) 

When expression (8) is satisfied, assortment increases the range of conditions 
under which R t is an ESS, decreases the range of conditions under which P is an 
ESS, and, if a stable internal equilibrium exists, decreases the frequency of P at 
that equilibrium. Note that the negative effects increase as the expected number 
of interactions increases. When (8) is not satisfied, increasing r decreases the range 
of parameters under which Ri is an ESS, increases the range under which P is an 
ESS, and may either increase or decrease the frequency of P at internal equilibria. 
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Second-Order Defection 

When punishers are common, cooperation is favored because cooperative in- 
dividuals avoid punishment. Thus, if punishment is costly, punishment may be 
an altruistic act. It is costly to the individual performing the punishment but 
benefits the group as a whole. This argument suggests that individuals who 
cooperate, but do not punish, should be successful. In the previous model (and 
that of Axelrod, 1986] cooperators always punish noncooperators, and thus this 
conjecture could not be addressed. To allow for second-order defection, consider 
a model in which P and R\ compete with the following strategy. 

Easy-going cooperator (E): Always cooperate, never punish 

When second-order defection is possible, neither E nor P is ever an ESS. A 
population in which P is common can always be invaded by E, because easygoing 
cooperators get the benefits of cooperation without incurring the cost of en- 
forcement. A population in which E is sufficiently common can always be in- 
vaded by R\, because reluctant cooperators can enjoy the benefits of cooperation 
without fear of punishment. 

Ri is an ESS whenever punishment does not pay (i.e., [6] is not satisfied]. At 
this ESS, there is no cooperation because reluctant cooperators behave as un- 
conditional defectors. If the long-run benefits of cooperation to an individual are 
not sufficient to offset the cost of coercing all the other members of the group to 
cooperate, the noncooperators can resist invasion by punishing or cooperating 
strategies. Persistent noncooperation is not the only possible outcome, however, 
under this condition. If P can resist invasion by R\ (i.e., [7] is satisfied], then 
simulation studies indicate that there may be persistent oscillations involving all 
three strategies. Such oscillations seem to require that the cost of being punished 
is much greater than the cost of punishing (p » k] and the benefits of coopera- 
tion barely exceed the cost (b«c]. 

If punishment does pay, the long-run outcome is a mix of reluctant co- 
operators who coexist with cooperator-punishers and, sometimes, easygoing 
cooperators. This can happen in three different ways: 

• There can be a stable mix of reluctant cooperators and cooperator- 
punishers. Such a stable equilibrium exists anytime there is a stable 
polymorphic equilibrium on the R\—P boundary in the absence of 
E. If, in addition, P is not an ESS in the absence of E, this mixture of 
reluctant cooperators and cooperator-punishers is the only stable 
equilibrium, and numerical simulations suggest that the polymorphic 
equilibrium is globally stable. Thus, at equilibrium, populations will 
consist of a majority of reluctant cooperators with a minority of 
cooperator-punishers. E cannot invade because rare E individuals 
often find themselves in groups without a cooperator-punisher and 
thus pay the cost of cooperation without receiving the long-run 
benefits of cooperation. Punishers in all groups received the benefits 
of long-term cooperation. 
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• If there is no polymorphic equilibrium on the Ri — P boundary [i.e., 
in the absence of E), then there is a single interior equilibrium point 
at which all three strategies are present. We have not been able to 
derive an expression for the frequencies of the three traits at these 
interior equilibria or determine when they are stable. Numerical 
simulation indicates that when an interior equilibrium exists, it is 
almost always stable. 

• The mixture of all three strategies can oscillate. When P is stable in 
the absence of E, the frequencies of the three strategies may oscillate 
indefinitely. Simulation studies suggest that this outcome only oc- 
curs under relatively rare parameter combinations. 

In each case, as group size increases, the average frequency of cooperative 
strategies typically declines to a quite low level. However, the average frequency 
of groups with at least one P individual, and therefore groups in which coop- 
eration occurs over the long run, can remain at substantial levels even when 
groups are large. One must keep in mind, however, that this conclusion pre- 
supposes that individual punishers can afford to punish every noncooperator in 
the group. A model in which the capacity to punish is limited would presumably 
stabilize at some higher frequency of punishers as group size increased. 


Moralistic Strategies 

The results of the previous section suggest that strategies that attempt to induce 
cooperation through retribution can always be invaded when they are common 
by strategies that cooperate but do not punish. However, such is not the case. 
Consider the following strategy. 

Moralists (M): Always cooperate, and punish individuals who are not 
in “good standing.” Individuals are in good standing if they have be- 
haved according to M since the last time they were punished or the 
beginning of the interaction. 

Thus, moralists punish individuals who do not cooperate. But they also punish 
those who do not punish noncooperators and those who do not punish non- 
punishers. Each M individual punishes others at most once per turn. Once an in- 
dividual is punished, he can avoid further punishment by cooperating, punishing 
noncooperators, and punishing nonpunishers (thus returning to good standing). 

Moralists can resist invasion by reluctant cooperators (R\) whenever the 
following is true 

<- '4 - - cwr'}) »*- w« + (i :4(l- e ) 

cost of being punished cost of cooperating and 

punishing 

The left-hand side of inequality (9) gives the cost to an Ri individual of being 
punished. It is proportional to the number of interactions because such reluctant 
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cooperators are punished every time there is an error. The right-hand side is the 
cost of cooperating and punishing. Thus, as long as the error rate is not exactly 
zero, moralists can resist invasion by R \ under a wider range of conditions than 
can P. 

Moralists can resist invasion by easygoing cooperators (E) whenever the 
following condition is satisfied: 

(1 - (1 - e) n )wp > ek [10) 

If errors occur only infrequently (ne-C 1), then this condition simplifies to be- 
come nwp > k. Thus, unless punishing is much more costly than being punished, 
moralists can resist invasion by easygoing cooperators. 

In fact, as Hirshleifer and Rasmusen (1989) have pointed out, moralistic 
aggression of this kind is a recipe for stabilizing any behavior. Notice that neither 
condition (9) or (10) involves terms representing the long-run benefits of coop- 
eration (i.e., terms of the form b/( 1 — wj). When M is common, rare individuals 
deviating from M are punished; otherwise, they have no effect on the behavior of 
the group. Thus, as long as being punished by all the other members of the group 
is sufficiently costly compared to the individual benefits of not behaving according 
to M, M will be evolutionarily stable. It does not matter whether or not the 
behavior produces group benefits. The moralistic strategy could require any ar- 
bitrary behavior — wearing a tie, being kind to animals, or eating the brains of dead 
relatives. Then M could resist invasion by individuals who refuse to engage in the 
arbitrary behavior unless punished, as long as condition (9) was satisfied (where 
c — bln is the cost of the behavior), and resist invasion by individuals who perform 
the behavior but do not punish others, as long as (10) is satisfied. 


Discussion 

Our results suggest that problems of second-order cooperation can be overcome 
in two quite different ways: first, even though retribution creates a group benefit, 
it need not be altruistic. If defectors respond to punishment by a single individual 
by cooperating, and if the long-run benefits to the individual punisher are greater 
than the costs associated with coercing other group members to cooperate, then 
the strategy that cooperates and punishes defectors can increase when rare and 
will continue to increase until an interior equilibrium is reached. At this equi- 
librium, the punishing strategy coexists with strategies that initially defect but 
respond to punishment by cooperating and, sometimes, strategies that cooperate 
but do not punish. For plausible parameter values, the punishing strategy is rarer 
than the other two strategies at such an equilibrium. However, since a single 
punisher is sufficient to induce cooperation, cooperating groups are nonetheless 
quite common. 

Increasing group size reduces the likelihood that this mechanism will lead to 
the evolution of cooperation because it increases the cost of coercion. This ef- 
fect, however, is not nearly so strong as previous models in which defection was 
punished by withdrawal of cooperation. In those models (Joshi, 1987; Boyd and 
Richerson, 1988, 1989), a linear increase in group size requires an exponential 
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increase in the expected number of interactions necessary for cooperation to 
increase when rare. In the present model, the same increase in group size re- 
quires only a linear increase in the expected number of interactions. 

Moralistic strategies that punish defectors, individuals who do not punish 
noncooperators, and individuals who do not punish nonpunishers can also over- 
come the problem of second-order cooperation. When such strategies are common, 
rare noncooperators are selected against because they are punished. Individuals 
who cooperate but do not punish are selected against because they are also pun- 
ished. In this way, selection may favor punishment, even though the cooperation 
that results is not sufficient to compensate individual punishers for its costs. 

It is not clear whether moralistic strategies can ever increase when rare. We 
have not presented a complete analysis of the dynamics of moralistic strategies 
because to do so in a sensible way would require the introduction of additional 
strategies, a consideration of imperfect monitoring of punishment, and a con- 
sideration of more general temporal patterns of interaction. We conjecture, 
however, that the dynamics will be roughly similar to the dynamics of P and R\ 
in the case in which there is no stable internal equilibrium: both defecting and 
moralistic strategies will be evolutionarily stable. Increasing the degree of as- 
sortment will mean that moralists will have fewer defectors to punish but will be 
punished more when they err. Assortative social interaction will not interact 
with group benefits in a way that will allow moralistic strategies to increase. 

It is also interesting that moralistic strategies stabilize any behavior. The con- 
ditions that determine whether M can persist when rare are independent of the 
magnitude of the group benefit created by cooperation. The moralistic strategy 
could stabilize any behavior equally well, whether it is beneficial or not. If our 
conjecture about the dynamics of M is correct, then the dynamics will not be 
strongly effected by whether or not the sanctioned behavior is group-beneficial. 

This result is reminiscent of the “folk theorem” from mathematical game 
theory. This theorem holds that in the repeated prisoner’s dilemma with a 
constant probability of termination (the case analyzed by Axelrod and most 
other evolutionary theorists], strategies leading to any pattern of behavior can be 
a game theoretic perfect equilibrium (Rasmusen 1989). The proof of this the- 
orem relies on the fact that if there is enough time available (on average) for 
punishment, then individuals can be induced to adopt any pattern of behavior. 
Thus, in games without a known endpoint, game theory may predict that any- 
thing can happen. This result, combined with the fact that nobody lives forever, 
has led many economists to restrict their analyses to games with known end- 
points. The diversity of equilibria here and in the nonevolutionary analysis can be 
regarded as a flaw or embarrassment for the analysis. 

We prefer to take these results as telling us something about the evolution of 
social behavior. Games without a known endpoint seem to us to be a good 
model for many social situations. Although nobody lives forever, social groups 
often persist much longer than individuals. When they do, individuals can expect 
to be punished until their own last act. Even dying men are tried for murder, and 
in many societies one’s family is also subject to retribution. If one accepts this 
argument, then it follows that moralistic punishment is inherently diversifying in 
the sense that many different behaviors may be stabilized in exactly the same 
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environment. It may also provide the basis for stable among-group variation. 
Such stable among-group variation can allow group selection to be an important 
process (Boyd and Richerson 1985, 1990 a,b], leading to the evolution of be- 
haviors that increase group growth and persistence. 


Conclusion 

Cooperation enforced by retribution is strikingly different from reciprocity in 
which noncooperation is punished by withdrawal of cooperation. We think two 
features of this system are interesting and warrant further study: 

1 . Cooperation may be possible in larger groups than is the case with 
reciprocity. This effect invites further study of the limitations on the 
ability of single individuals to punish and how coalitions of punishers 
might or might not be able to induce reciprocity in very large groups. 

2. In the model studied here, punishers collect private benefit by 
inducing cooperation in their group that compensates them for 
punishing, while providing a public good for reluctant cooperators. 
There are often polymorphic equilibria in which punishers are rel- 
atively rare, generating a simple political division of labor reminis- 
cent of the “big man” systems of New Guinea and elsewhere. This 
finding invites study of further punishment strategies. Consider, for 
example, strategies that punish but do not cooperate. Such in- 
dividuals might be able to coerce more reluctant cooperators than 
cooperator-punishers and therefore support cooperation in still 
larger groups. If so, such models might help explain the evolution of 
groups organized by full-time specialized, “parasitical” coercive 
agents like tribal chieftains. 

The importance of the study of retribution can hardly be underestimated. 
The evolution of political complexity in human societies over the last few 
thousand years depended fundamentally on the development of a variety of 
coercive strategies similar to those we have investigated here. 


APPENDIX 

SENSITIVITY OF THE MODEL TO THE RESPONSE TO PUNISHMENT 

The effects of punishment on the evolution of cooperation are strongly affected by 
the extent to which a defector responds to punishment by cooperating. To see this, 
consider a game in which cooperator-punishers (P) compete with the following 
nonresponsive strategy. 

Unconditional defectors (U): Never cooperate. Never punish. 

Many of the evolutionary properties of the two-person repeated prisoner’s di- 
lemma can be derived considering a model in which only tit-for-tat (TFT, cooperate 
on the first move, and punish each defection by defecting) and ALLD (always defect) 
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are present. Our strategies P and U seem like the natural generalizations of TFT and 
ALLD to the n-person game with punishment, and one might (as we did) expect that 
their evolutionary dynamics would be similar. This expectation is largely incorrect. 
Understanding why provides useful insight into the evolutionary effects of punish- 
ment. For simplicity, we assume that there are no errors (e = 0) throughout this 
section. 

Let j be the number of the other n— 1 individuals in the group who are P. The 
expected fitness of U individuals given j is: 

cad 

Similarly, the expected fitness of P individuals given j is: 


V {Pj) = b/nti+\)-c-{n-\-j)k 
The expected fitness of U individuals averaged over all groups is: 


(A2) 


W{U)= }2mUU)V(VJ) 

j = 0 

= £(;|U) fc ^ (A3) 

where m(;| U) is the probability that there are j other cooperator-punishers, given 
that the focal is an unconditional defector, and £(;'| U) is the expected value of j 
conditioned on the focal individual being U. An analogous calculation shows that 

W{ py = {b/nWU\P)+l)-c-[n-l-EU\P))k (M} 

When groups are formed at random E[j\F) = E[j\U) = (n — 1 )q where q is the fre- 
quency of P in the population just before groups are formed. To determine when U is 
an ESS, let q-> 0 and determine when W(U) > W[P). To determine when P is an ESS, 
let q->l and determine when W(U) < W(P). When groups are formed assortatively 
and P is rare, E(;|P) = (n— 1 )r and E( j\ U) = 0. Combining these expressions yields 
the condition for P to increase when rare (A6). 

It follows from these expressions for the fitness of U and P that (1) unconditional 
defection is always an ESS, and (2) P is an ESS only if: 

e — b/n <{n — \)p (A5) 

The left-hand side of (A5) is the per period cost to an individual of cooperating, and 
the right-hand side is the per period cost of being punished by n — 1 individuals. 

Superficially these properties seem analogous to the competition between 
always-defect and tit-for-tat in the two-person model. Always-defect is always an 
ESS; tit-for-tat is an ESS only under certain conditions. However, notice that (A5) does 
not depend on the parameter w, which measures the average number of interactions. 
Thus, if (A5) is satisfied, P is stable even if individuals interact only once! In contrast, 
tit-for-tat is stable against always-defect only if w is large enough that the long-run 
benefit of reciprocal interaction is greater than the short-term benefit of cheating. 
Tit-for-tat is never stable if individuals interact only once. 

The qualitative difference between the two models is made clearer if we con- 
sider the effect of assortative group formation. In the two-person case, assortative 
group formation makes it easier for tit-for-tat to increase when rare, and if w is near 
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one, even a small amount of assortment is sufficient. In the present model, the 
punishing strategy, P, can increase when rare if the following is true: 

{b/n)[r[n - 1) + 1] - c > (n - DP - r)k 
inclusive fitness punishment 

The left-hand side gives the inclusive fitness advantage of cooperators relative to 
defectors. If P individuals are sufficiently likely to interact with other P individuals 
(r->l)> then P can increase in frequency even when it is rare in the population 
because P individuals benefit from the cooperation of other P individuals in their 
groups. The right-hand side gives the effect of punishment on the fitness of P in- 
dividuals. Notice that this term is always positive. This means that cooperation 
supported by punishment is harder to get started in a population than unconditional 
cooperation. 

Why are these two models so different? In models without retribution, recip- 
rocal strategies such as tit-for-tat are favored because they lead to assortative inter- 
action of cooperators (Michod and Sanderson, 1985). Even if individuals are paired at 
random, the fact that tit-for-tat individuals convert to defection if they experience 
acts of defection from others, causes a nonrandom distribution of cooperative be- 
havior: tit-for-tat individuals are more likely to receive the benefits of cooperation 
than are always-defect individuals. In contrast, in the present model, punishment has 
no effect on who receives the benefits of cooperative behavior. P individuals continue 
to cooperate while they punish, and U individuals do not respond to punishment by 
cooperating — they keep defecting. Models of reciprocity without punishment suggest 
that the strategy of punishing defectors by withdrawing cooperation is unlikely to 
work in large groups (Joshi, 1987; Boyd and Richerson, 1988). However, it is not 
unreasonable to imagine that a kind of conditional defector might respond to pun- 
ishment by cooperating much as tit-for-tat responds to cooperation with more co- 
operation. 


SHOULD DEFECTORS RESPON D TO PUNISHMENT? 

Should defecting individuals respond to punishment by cooperating? To address this 
question, we consider the conditions in which R\ can invade a population in which 
the strategy U is common. We further assume that groups are formed at random. 

Unfortunately, the answer to this question does not depend on the fitness 
consequences of alternative behaviors alone. It also depends on what kinds of pun- 
ishing strategies are maintained in the population by nonadaptive processes like 
mutation and nonheritable environmental variation. In a population in which only U 
and Ri are present (and every individual accurately follows its strategy), U and R\ will 
have the same expected fitness. Both will defect forever and never be punished 
because no punishing strategies are present. The strategies U and R\ will have dif- 
ferent expected fitnesses only if there are punishing strategies present in the popu- 
lation. If U is common, however, the expected fitness of any rare punishing strategy 
must be less than the expected fitness of U. This means that any punishing strategies 
present in the population must be maintained by nonadaptive processes like errors or 
mutation. R] may or may not be able to invade, depending on the mix of punishing 
strategies maintained by such forces. 

We conjecture that the most plausible source of nonadaptive variation is mistakes 
about the behavioral context. Modelers typically assume that there is a single be- 
havioral context, with given costs and benefits, and an unambiguous set of behavioral 
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strategies. However, in the real world, there are many behavioral contexts, each with 
its own appropriate strategy. Before deciding how to behave, individuals must cate- 
gorize a particular situation as belonging to one context or another. It seems plausible 
that individuals sometimes miscategorize situations in which punishment is not fa- 
vored and thus mistakenly punish others. Suppose, for example, selection favors 
individual retaliation if others damage personal property. Then individuals might 
sometimes punish others who damage commonly held property because they mis- 
takenly miscategorize the behavior. 

To prove that R] may or may not be able to invade U, consider the second 
punishing strategy. 

Timid punishers (Tj) : Always cooperate. Punish each defector the first 

time it defects, but only the first time. 

Suppose that both U and R occasionally mistakenly play one of the punishing 
strategies. This could occur because individuals mistake the behavioral context for 
one in which they would normally punish. The relative fitness of U and R] depends 
on which of these two punishing strategies is present. Suppose that individuals oc- 
casionally play T\ by mistake. R\ can invade if a focal R\ individual has higher fitness 
than a focal U individual in groups with one T\ individual among the other n — 1 . In 
such groups, 


W ^ = Y~^~ P (A7) 

wm = } b/ _ n w -p + A w w &/ n - (as) 

Thus, U is always favored if cooperation is costly. In contrast, when P is present as a 
result of errors, the fitnesses of the two types are as follows: 



W{U) = b/n — p + 

j w [b/n-p] 

(A9) 


W{Rx) = b/n-p + . 

1 W J2b/n- c ) 

(A io) 

Thus, R| : 

is favored whenever the costs 

of punishment exceed the 

cost of 


cooperating. 

We think that this result is likely to be quite general. Consider a strategy that 
begins cooperating only after being punished some number of times. Such a strategy 
will have higher fitness than an unresponsive strategy only if the punishing strategies 
present in the population continue to punish on subsequent turns. If they do not, the 
unresponsive strategy gets the benefit without paying the cost. When should pun- 
ishing strategies give up? The answer to this question depends on whether the de- 
fecting strategies will respond. If defecting strategies are unresponsive, costly 
punishment provides no benefits. 

EQUILIBRIA WHEN R, AND P COMPETE 

Let j be the number of P individuals among the other n— 1 individuals in a group. 
Then the expected fitnesses of the two types are: 



LOWS 


E EVOLUTION OF COOPERATIO 


183 


P) = Cl - e)[{b/nWU P) + \)-cl - k\n 1 - (1 - e)E{j \ P )] 

- epEU\p) + K 1 “ 4( b-c)- ek(n - 1) - e P E{_] \ P)] (A1 1) 


Wm=Wn)V-e)-p]EUm 

+ Y ^ w [^-e){b-c)-epEU\R l )] 


where Pr(/= 0|Ri) is the probability that an R\ individual hnds himself in 
with exactly zero P individuals. 


(A 12) 

a group 


When groups are formed at random, E(;|P) = E(;|Ri) = (w — 1 )q and Pr(;' = 
OlRi) = (1 — q) n ] , where q is the frequency of P. Making these substitutions leads to 
the following condition for R] to increase: 


C k + p){n - 1)(1 - q) -j^O - qT \b - c ) 

-l / « + .-X»-l) + j ! ^ij>0 (A13) 

The condition for R\ to be an ESS (7) is derived by setting q= 0 in (A1 3). The 
condition for P to be an ESS (6) is derived by setting q= 1 in (A13). 

To derive the necessary conditions for a stable internal equilibrium, first notice 
that the left-hand side of (A13) is a concave function with, at most, a single internal 
maximum. Thus, if neither Rj or P is an ESS, then there is a single internal equi- 
librium point. If Ri is not an ESS but P is, then there are two internal equilibria, one 
stable and the other unstable, if, and only if, the value of the left-hand side at that 
maximum is greater than zero. The value of q that maximizes the left-hand side of 
(A13) can be found by differentiation. Substituting this back into (A13) yields the 
following necessary condition for the existence of two internal equilibria: 


( (1 - w)(k + R) ] 1/( "- 2) f ,. 

1 »#-c) J 1 

p(n-l)- C + b /n - 7 


- m +p)> 

ek(n — 1) 


(A14) 


If this condition is not satisfied, then P is the only ESS. 

To derive the condition for Rj to increase when groups are formed assortatively, let 
E(j\P) = (n— l)(r+ (1 — r)q) andE(;|Ri) = (n — 1)(1 — r)q and proceed in the same way. 


EQUILIBRIA WHEN R„ E, AND P COMPETE 

Let i and ; be the numbers of E and P individuals among the other n— 1 individuals. 
Here is the equation: 

W(F) = Cl - e)[[b/n)[Em + E(j\P ) + 1 ) ~ c] 

_ k [n - 1 - (1 - e)(E(i\P) + E{j | R))] - e P E{j | R) 

+ Y^w 1 1(1 “ eXb ~ c) ~ ek{n ~ 13 “ epEU 1 P)] (A1 5) 

WCRi3«#/n)(l -e)(E(i|Ri) + E(;R,)) ~pEU\Ri) 

+ Y^w ?rU > 0|Rl)[(1 “ e][b ~ c) ~ epEUlRl ki > ° )] 

+ ~ ^(i'/n)Pr(;-0R l )R(«:R 1 &;-()) 


(A 16) 
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W{E) = (1 - e-)[[b/nwm + EU\E] + 1) - c] - epE(j\E) 

'& > 0|E)[(1 - e)(b c) epEU\Ekj > 0)] 

+ y^(l - e)Pr(j=0\E)[(b/ri)(E(i\E&zj=O) + 1) - c] (A17) 

Assume that groups are formed at random so that E[j\E] = E[j\P] = £(;|R]) = 

(n ~ l)q P , Em = Em = E(i\Rl) = ("- Pr(; = 0| E) = Prtf = 0|R,) = (1 -q P f" ~ l \ 

and E(i\R\&j=0) = E(i\E&j=0)=(n — 1)0?e/O — qp)) where q E and qp are the fre- 
quencies of £ and P. When q E = 1, W(E) < 1U(R]) and when q P = 1, W(P) < W(E). 

First, we derive conditions for the existence of an internal equilibrium and show 
that if such an equilibrium exists, it is unique. 

It is useful to define the following functions, which give the difference in fitness 
between each pair of strategies as a function of q P and q E : 

dpE[qp,q E ) = W(P) - W (£J (A1 8) 

dmlqpAE ) = W(R$ - W[E) (A 19) 

dpu{qp,q E ) = W(F) - W(Ri) (A20) 

Using equations (A15), (A1 6), and (A17) and the assumption of random group 
formation yields the following expression for d RF ; 

dmiqp, qE ) = - PC 1 “ d)[n - 1 )qp 

+ (1 - e)(c - b/n)\l + ^(1 - qpf- 1 ) (A21) 

Notice that the relative fitness of R\ and £ depends only on q P . Further, note that (1) 
<4 e( 0, q E ~) > 0; (2) d RE (1, q E ) < 0 as long as c- bln <(n- 1 )p, which is true by 
assumption; and (3) dRE is a monotonically decreasing function of q P . Thus, the value 
of q P at equilibrium is unique and can be found by finding the root of dRE = 0 as 
shown in figure A1 . Let this value of q P be q P . 

Once again, using equations (A15), (A1 6), and (A17) and the assumption of 
random group formation yields the following expression for d P E- 

dp E [qp,qE ) = y” w ^ + k(l - e)(n - 1)(1 - q E - qp ) 

wb {n - 1)(1 - eXl - qpJ-\ 1 - q P - q E ) 

n(l - W ) 

Assume that q E is fixed at some value. Then 

dp E { 1 - q E ,q E ) = — \_ w ^ < ® 

dpEiO.qE) = + (1 - q E )[ 1 - e){n - 1)(^^ - *) 

Thus, d RE [0,q E ) > 0 if 

ke 


q E <l~ 


(l-e)(w(b/n)-k{l-w)) 


(A22) 
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Figure An . This figure illustrates the logic of the proofs given in this section. The left-hand 
pair of figures represents a situation in which there is a single polymorphic equilibrium 
on the R\—P boundary. The lower figure shows dg^[qp,0) and d P E[q P ,0). These curves 
intersect only once since there is a single polymorphic equilibrium. Thus, we know that 
q' P <qp. The upper figure shows how the forms of dRE{q p ,q E ) and dp E [q p ,qE ) guarantee 
that there is no internal equilibrium in this case. The right-hand pair of figures represents the 
situation in which there is no polymorphic equilibrium on the R\ — P boundary 
because P increases for all values of q P < 1 . 

w(b/n) - k{\ - w~] > 0 (A23) 

Otherwise, d PE < 0. Differentiating shows that d PE is a convex function of qp. Thus, 
if (A22) and (A23) are satisfied, d PE [q P ,q E ) = 0 has a unique root for each q P as 
illustrated in figure Al. Let this root be Increasing q E leads to a decrease in 

q' P (qn). Thus, there is a internal equilibrium value if, and only if, q' P (0) > qp, and if it 
exists, such an equilibrium is unique. This result is shown graphically in figure Al. 

We know that dRE(q P ) is monotonically decreasing and has one root in the 
interval (0, 1) whenever R\ is potentially present, and that d PE {q P ) has at most one 
root and is monotonically decreasing in the interval that contains the root. 

Next, we show that if there is no stable polymorphic equilibrium on the P—Ri 
boundary in the absence of E, then there is an internal equilibrium. If there is no 
stable equilibrium on the boundary in the absence of E, it follows from the results of 
the previous section that 


d P R{qp,0) > 0 
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for all q P Next, note that 

dpR{qp,qE ) = dpE(qp,qE ) - dmi/lPM) (A24) 

Thus, there is an internal equilibrium since dppfq Pl O) > d R pfqp,0} for all values of q P . 
This situation is shown in the right-hand pair of figures in figure Al. 

Next, we show that if there is a stable, polymorphic boundary equilibrium such 
that q P , = 0 and Vb(Ri) > W(P) for q P = 1 , then there is no internal equilibrium. Let qp 
be the frequency of P at a polymorphic equilibrium on the P — R\ boundary. Then 
dpR[qp,0) = 0, which implies that d PP {q P ,G) = d RP (q P ,0). The fact that the equilib- 
rium is stable in the absence of E implies that dd P R(q P ,0)/dq P < 0 at q p . Since 
d PR (l ,0) < 0, it follows that d PE (q P ,0] < d R pfqp,0) for qp <qp < 1. But this means 
that qp(0) < qp, and, therefore, there is no internal equilibrium as shown in the left- 
hand pair of figures in figure Al . 

It is important to note that there may be no internal equilibrium even if 
ITXRi) < W{P). When this is the case, there is a second, unstable internal equilibrium 
on the P-i — P boundary. Anytime that d PE = d RP < 0 at this equilibrium, there will be 
no internal equilibrium, and numerical studies suggest that this is what actually 
occurs at the vast majority of parameter combinations. 


MISAN ESS AGAINST PAND E 

Assume that M is common. When groups are formed at random, M can resist in- 
vasion by rare Ri individuals if the average payoff of M in groups with n — 1 other M 
individuals, V(M\n— 1) is greater than the average payoff of R\ in groups in which 
the other n — 1 individuals are M, V(Ri\n— 1): 

V[M\ n - 1) = } l _ w t(b -c)[l-e)-e[n-m + p)) (A25) 

V(Ri\ n - 1 ) = (n - l)(b/n)(l -e)-[n- 1 )p 

+ jTjn- d)(b -c)- p[n - 1X1 - Cl - eT)] (A26) 

Substituting these expressions and simplifying yields condition (9). Similarly, the 
expected fitness of an E individual in a group of n— 1 M individuals, V(Eln— 1), is: 

V(E\n-l) = 0-e)(b-c)-e(n-l)p 

+ ^ CD - eXb -cj- e[n - 1 )p - p{n - 1)(1 - (1 - *)")] (A27) 

This expression is used to determine when V(M\n — 1) > V(Eln — 1) yields equation (10). 

NOTE 

We thank Alan Rogers for useful comments and for carefully checking every result in 
this chapter. Joel Peck also provided helpful comments. 
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1 O Why People Punish Defectors 

Weak Conformist Transmission 
Can Stabilize Costly Enforcement 
of Norms in Cooperative 
Dilemmas 

With Joseph Henrich 

In many societies, humans cooperate in large groups of unrelated 
individuals. Most evolutionary explanations for cooperation combine kinship 
(Hamilton, 1964] and reciprocity (“reciprocal altruism,” Trivers, 1971], These 
mechanisms seem to explain the evolution of cooperation in many species in- 
cluding ants, bees, naked mole rats, and vampire bats. However, because social 
interaction among humans often involves large groups of mostly unrelated indi- 
viduals, explaining cooperation has proved a tricky problem for both evolution- 
ary and rational choice theorists. Evolutionary models of cooperation using the 
repeated n - person prisoner’s dilemma predict that cooperation is not likely to be 
favored by natural selection if groups are larger than around 1 0, unless related- 
ness is very high (Boyd and Richerson, 1988]. As group size rises above 10, to 
100 or 1000, cooperation is virtually impossible to evolve or maintain with only 
reciprocity and kinship. 1 

Many students of human behavior believe that large-scale human coopera- 
tion is maintained by the threat of punishment. From this view, cooperation 
persists because the penalties for failing to cooperate are sufficiently large that 
defection “doesn’t pay.” However, explaining cooperation in this way leads to 
a new problem: why do people punish noncooperators? If the private benefits 
derived from punishing are greater than the costs of administering it, punishment 
may initially increase but cannot exceed a modest frequency (Boyd and Richerson, 
1992]. Individuals who punish defectors provide a public good, and thus can be 
exploited by nonpunishing cooperators if punishment is costly. Second-order free 
riders cooperate in the main activity but cheat when it comes time to punish 
noncooperators. As a consequence, second-order free riders receive higher pay- 
offs than punishers do, and thus punishment is not evolutionarily stable. Adding 
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third- (third-order punishers punish second-order free riders) or higher-order 
punishers only pushes the problem back to higher orders. Solving this problem 
is important because there is widespread agreement that the threat of punish- 
ment plays an important role in the maintenance of cooperation in many human 
societies. 

Social scientists have explained the maintenance of punishment in three 
ways: (1) many authors assume that a state or some other external institution 
does the punishing; (2) others assume punishing is costless (McAdams, 1997; 
Hirshleifer and Rasmussen, 1989); and (3) a few scholars incorporate a recursive 
punishing method in which punishers punish defectors, individuals who fail to 
punish defectors, individuals who fail to punish nonpunishers, and so on in an 
infinite regress (Boyd and Richerson, 1992; Fundenberg and Maskin, 1986). 
However, none of these solutions is satisfactory. While it is useful to assume 
institutional enforcement in modern contexts, it leaves the evolution and 
maintenance of punishment unexplained because at some point in the past there 
were no states or institutions. Furthermore, the state plays a very small role 
in many contemporary small-scale societies that nonetheless exhibit a great deal 
of cooperative behavior. This solution avoids the problem of punishment by 
relocating the costs of punishment outside the problem. The second solution, 
instead of relocating the costs, assumes that punishment is costless. This seems 
unrealistic because any attempt to inflict costs on another must be accompanied 
by at least some tiny cost — and any nonzero cost lands both genetic evolutionary 
and rational choice approaches back on the horns of the original punishment 
dilemma. The third solution, pushing the cost of punishment out to infinity, also 
seems unrealistic. Do people really punish people who fail to punish other 
nonpunishers, and do people punish people who fail to punish people, who fail 
to punish nonpunishers of defectors and so on, ad infinitum? Although the 
infinite recursion is cogent, it seems like a mathematical trick. 


Conformist Transmission in Social Learning 
Can Stabilize Punishment 

In this chapter, we argue that the evolution of cooperation and punishment 
are plausibly a side effect of a tendency to adopt common behaviors during 
enculturation. Humans are unique among primates in that they acquire much of 
their behavior from other humans via social learning. However, both theory and 
evidence suggest that humans do not simply copy their parents, nor do they copy 
other individuals at random (Henrich and Boyd, 1998; Takahasi, 1998; Harris, 
1998). Instead, people seem to use social learning rules like "copy the success- 
ful” (termed pay-off biased or prestige-biased transmission; see Henrich and Gil- 
White, 2001) and "copy the majority” (termed conformist transmission; Boyd 
and Richerson, 1985; Henrich and Boyd, 1998), which allow them to shortcut 
the costs of individual learning and experimentation and leapfrog directly to 
adaptive behaviors. These specialized social learning mechanisms provide a gen- 
eralized means of rapidly sifting through the wash of information available in the 
social world and inexpensively extracting adaptive behaviors. These social 
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learning shortcuts do not always result in the best behaviors, nor do they prevent 
the acquisition of maladaptive behaviors. Nevertheless, when averaged over 
many environments and behavioral domains (e.g., foraging, hunting, social in- 
teraction, etc.], these cultural transmission mechanisms provide fast and frugal 
means to acquire complex, highly adaptive behavioral repertoires. 

Both theoretical and empirical research indicates that conformist transmis- 
sion plays an important role in human social learning. We have already shown 
that a heavy reliance on conformist transmission outcompetes both unbiased 
[i.e., vertical] transmission and individual learning under a wide range of con- 
ditions (Henrich and Boyd, 1998], and especially when problems are difficult. 
Second, empirical research by psychologists, economists, and sociologists shows 
that people are likely to adopt common behaviors across a wide range of decision 
domains. Although much of this work focuses on easy perceptual tasks (Asch, 
1951] and confounds normative conformity (going with the popular choice to 
avoid appearing deviant] with conformist transmission (using the popularity of a 
choice as an indirect measure of its worth], more recent work shows that social 
learning and conformist transmission are important in difficult individual prob- 
lems (Baron, Vandello, and Brunsman, 1996; Insko, Smith, Alicko, Wade, and 
Taylor, 1985; Campbell and Fairey, 1989], voting situations (Wit, 1999], and 
cooperative dilemmas (Smith and Bell, 1994]. 

Conformist transmission can stabilize costly cooperation without punish- 
ment but only if it is very strong. All other things being equal, payoff-biased 
transmission causes higher payoff variants to increase in frequency, and thus 
cooperation is not evolutionarily stable under plausible conditions — because not 
cooperating leads to higher payoffs than cooperating. Thus, payoff-biased 
transmission, alone, suffers the same problem as natural selection in genetic 
evolution. However, under conformist transmission individuals preferentially 
adopt common behaviors and thus increase the frequency of the most common 
behavior in the population. Thus, if cooperation is common, conformist trans- 
mission will oppose payoff-biased transmission and, as long as cooperation is not 
too costly, maintain cooperative strategies in the population. However, if the 
costs of cooperation are substantial, it is less likely that conformist transmission 
will be able to maintain cooperation. 

A quite different logic applies to the maintenance of punishment. Suppose 
that both punishers and cooperators are common and that being punished is 
costly enough that cooperators have higher payoffs than defectors. Rare invading 
second-order free riders who cooperate but do not punish will achieve higher 
payoffs than punishers because they avoid the costs of punishing. However, 
because defection does not pay, the only defections will be due to rare mistakes, 
and thus the difference between the payoffs of punishers and second-order free 
riders will be relatively small. Hence, conformist transmission is more likely to 
stabilize the punishment of noncooperators than cooperation itself. As we as- 
cend to higher-order punishing, the difference between the payoffs to punishing 
versus nonpunishing decreases geometrically toward zero because the occasions 
that require the administration of punishment become increasingly rare. Second- 
order punishing is required only if someone erroneously fails to cooperate, and 
then someone else erroneously fails to punish that mistake. For third-order 
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punishment to be necessary, yet another failure to punish must occur. As the 
number of punishing stages (i] increases, conformist transmission, no matter 
how weak, will at some stage overpower payoff-biased imitation and stabilize 
common i-th order punishment. Once punishment is stable at the i-th stage, 
payoffs will favor strategies that punish at the [i — l]-th order, because common 
punishers at the i-th order will punish nonpunishers at stage i — 1 . Stable pun- 
ishment at stage [i — 1) means payoffs at stage i — 2 will favor punishing strate- 
gies, and so on down the cascade of punishment. Eventually, common first-order 
punishers will stabilize cooperation at stage 0. 

It is important to see that the stabilization of punishment is, from the gene’s 
point of view, a maladaptive side effect of conformist transmission. If there were 
genetic variability in the strength of conformist transmission (a] and cooperative 
dilemmas were the only problem humans faced, then conformist transmission 
might never evolve. However, human social learning mechanisms were selected 
for their capability to efficiently acquire adaptive behaviors over a wide range 
of behavioral domains and environmental circumstances — from figuring out 
what foods to eat, to deciding what kind of person to marry — precisely because 
it is costly for individuals to determine the best behavior. Hence, we should ex- 
pect conformist transmission to be important in cooperation as long as distin- 
guishing cooperative dilemmas from other kinds of problems is difficult, costly, 
or error-prone. Looking across human societies, we find that cooperative dilem- 
mas come in an immense variety of forms, including harvest rituals among ag- 
riculturalists, barbasco fishing among Amazonian peoples, warfare, irrigation 
projects, taxes, voting, meat sharing, and anti-smoking pressure in public places. 
It is difficult to imagine a cognitive mechanism capable of distinguishing coop- 
erative circumstances from the myriad of other problems and social interactions 
that people encounter. 

In what is to come, we formalize this argument. Our goal is to demonstrate 
the soundness of our reasoning and show how very weak conformist transmission 
can stabilize cooperation and punishment. After demonstrating this, we will de- 
scribe how cooperation, once it is stabilized in one group, can spread across many 
populations via cultural group selection. We will also briefly show how genes for 
prosocial behavior may eventually spread in the wake of cultural evolution. 


A Cultural Evolutionary Model of Cooperation and Punishment 

In this model, a large number of groups, each consisting of N individuals, are 
drawn at random from a very large population. Individuals within each group 
interact with one another in an i + 1 stage game. The first stage is a one-shot 
cooperative dilemma, which is followed by i stages in which individuals can 
punish others. We number the first, cooperative stage as 0 and the punishment 
stages as 1, . . . , i. The behavior of individuals during each stage is determined by 
a separate culturally acquired trait with two variants, P (prosocial variant] and 
NP (not prosocial variant]. 

During the initial cooperative dilemma, individuals can either “cooperate” — 
contribute to a public good — or “defect” — not contribute and free-ride on the 
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contributions of others. Each cooperator pays a cost C to contribute a benefit B 
(B > C] to the group — this B is divided equally among all group members. 
Defectors do not pay the cost of cooperation [C] but do share equally in the total 
benefits. The variable po represents the frequency of individuals in the popula- 
tion with the cooperative variant in stage 0. People with the cooperative variant 
“intend” to cooperate but mistakenly defect with probability e. Individuals who 
have the defecting variant always defect. This makes sense because, in the real 
world, people may intend to cooperate but fail to for some reason. For example, 
a friend who plans to help you move may forget to show up or have car trouble 
en route. Defectors, however, are unlikely to mistakenly show up on moving day 
and start carrying boxes. We will assume errors are rare, so the value of e is small. 

During the first punishment stage, individuals can punish those who de- 
fected during the cooperation stage. Doing this reduces the payoff of the in- 
dividuals who are punished by an amount p, at a cost of </> to the punisher 
[(/> < p < C], Individuals with the punishing (P) variant at this stage intend to 
punish but mistakenly fail to punish with probability e. Nonpunishers, those 
with the NP-variant at stage 1, do nothing. We use p\ to stand for the frequency 
of first-stage punishers (i.e., individuals who have the P-variant at stage 1], and 
(1 —pi) gives the frequency of first-stage free riders. 

During the second punishment stage, individuals with the P-variant punish 
those who did not punish the noncooperators during the previous stage with 
probability (1 — e) and mistakenly fail to punish with probability e. And as be- 
fore, punishment costs punishers 0 to administer and costs those being punished 
an amount p. Those with the NP-variant at stage 2 do not punish. Let p 2 be the 
frequency of second-stage punishers. At stage 3, individuals with the P-variant 
will punish individuals from stage 2 who failed to punish nonpunishers from 
stage 1 . The costs of punishment remain the same. Those with the NP-variant 
in stage 3 will not punish anyone from stage 2. The pattern repeats as one 
descends to stage i in table 10.1 [pi gives the frequency of punishers at stage i). 
Because the interaction ends after stage i, individuals who fail to punish on stage 
i cannot be punished. Note that the trait that controls individual behavior at each 
stage has only two variants, and the values of variants at different stages are 


Table 10.1. Dichotomous traits for cooperation and punishment 


Stage 

Frequency of 
P-variant 

P-variant 

NP-variant 

0 

Po 

Cooperate 

Defect 

1 

Pi 

Punish defectors 

Do not punish defectors 

2 

P2 

Punish nonpunishers at 

Do not punish nonpunishers at 



stage 1 

stage 1 

3 

P3 

Punish nonpunishers at 

Do not punish nonpunishers at 



stage 2 

stage 2 

i 

Pi 

Punish nonpunishers at 

Do not punish nonpunishers at 



stage i — 1 

stage i — 1 
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independent — so an individual could cooperate at stage 0 (have the P-variant}, 
not punish at stage 1 (NP-variant}, and punish at stage 2 (P-variant}. 

After all the punishments are complete, cultural transmission takes place. 
As we explained earlier, two components of human cognition create forces that 
change the frequency of the different variants: payoff-biased and conformist- 
biased imitation. Equation (1} gives the change in the frequency of stage 1 
cooperators as a consequence of payoff-biased and conformist transmission (see 
Henrich, 1999}. 

Ap 0 =Po(l -Po} ,[(l - b c ~ h D ) + «(2p 0 - 1)]. m 

Payoff — biased Conformist 

The parameter a varies from 0 to 1 and represents the strength of conformist 
transmission in human psychology relative to payoff-biased transmission. We will 
generally assume a. is positive but small. Practically speaking, a must be less than 
0.50, because otherwise beneficial variants would never spread — once a variant 
became common, it would remain common no matter how deleterious. The 
second term in equation (1}, labeled “conformist,” varies in magnitude from —a 
to +a and is the component of the overall bias contributed by conformist 
transmission. In the term labeled “payoff-biased,” the symbols he and bo are the 
payoffs to cooperators and defectors, respectively. The quantity ( b c — ho}, which 
we label A b 0 , gives the difference in payoffs between cooperation (P-variant} and 
defection (NP-variant} in stage 0. More generally, Ah, is the difference in payoffs 
between the P- and NP-variants during the i-th stage. The parameter (1 nor- 
malizes the quantity Ah, so that it varies between —1 and +1, and therefore 
f}= 1 /\Ab,\ max . Thus, the term labeled “payoff-biased” varies between —(1 — a} 
and +(1 — a} and represents the component of the overall bias contributed by 
payoff-biased transmission. 

The expected payoffs, h, to the P- and NP-variant at each stage depend on 
the rate of errors, the costs of cooperation and punishment, and the frequency of 
cooperators and punishers in the population. At stage 0, cooperators receive an 
average payoff of he, while defectors receive an average payoff of h D : 

h c = (1 - <?}(p 0 £( 1 - e) - C + e[poB - Npip)}, 
b D =(l-eXpoB-N Pl pl (2} 

Ah 0 = h c — b D = (1 - eWpi (1 - e)p - C} 

Also as we mentioned, the term Aho gives the difference in payoffs between the 
two variants that control stage 0 behavior. 

A Heuristic Analysis 

Let us first analyse equation (1} by asking under what conditions will trans- 
mission favor cooperation (Ap 0 > 0} in the absence of stage 1 punishers (pi =0}. 
In this case, Aho = — C(1 — e), which is always negative; hence, payoff-biased 
transmission never favors cooperation in the absence of punishment. So, to give 
cooperation its best chance, we assume that by some stochastic fluctuations, the 
frequency of cooperators ends up near one. How big does a have to be so that 
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conformist transmission overpowers payoff-biased transmission and increases the 
frequency of cooperators? The frequency of cooperators increases when 


a 0 > 


1 + A,C(1 — e) 


(3) 


where a * (here, i = 0) is the minimum value of a that favors the spread or main- 
tenance of the P-variant at stage i (A p, > 0). With no punishment, 1 /|Afi;| max 
means f} 0 = 1/(C(1 — e)). As a consequence, a 0 must be greater than 0.50, and 
as we mentioned earlier, a { > 0.50 seems extremely unlikely because such 
high values would prevent the diffusion of novel practices — cultures would be 
entirely static (see Henrich, 2001). Hence, conformist transmission, operating 
directly on cooperative strategies, is unlikely to maintain cooperation in the ab- 
sence of punishment. 

Now, let us examine the conditions under which first-stage punishment will 
increase in frequency. Again, the change in the frequency of first-stage punishers, 
Api, is affected by both payoff-biased and conformist transmission: 


Api =pi(l -pi)[(l - <x)P(bp\ - hivpi) +«(2pi - 1)] (4) 


The payoffs (fis) to punishment and nonpunishment depend on the cost of 
punishing (()>) and of being punished (p), as well as the chance of mistakenly not 
punishing (e). The subscript PI indicates the P-variant at stage 1, while NP1 
indicates the NP-variant at stage 1 . 


bpi = -(1 - e)N4>(l - p 0 +p 0 e ) - eNp 2 p(\ - e), 
b NP i = -Np 2 ( 1 - e)p, (5) 

Abj = bpi - b NP \ = — N( 1 - e)[<p(\ - (1 - e)po) - pi{ 1 - e)p) 


Assuming that there is only one punishment stage (i= 1), and that cooperators 
and stage 1 punishers are initially common (p 0 =l and p 1 = 1), then A b\ = 
—N[ 1 — e)e(j>. If errors are rare enough that terms involving e 2 are negligible, 
then Ab\ w —Necj). Thus, the difference in payoff between the P-variant and 
the NP-variants at stage 1 is just the cost of punishing cooperators who make 
errors. If e < (1/N), which is plausible unless groups are very large, then A b\ is 
less than (j > — and smaller than A b 0 because (f> < p < C. Note that, when i > 0, 
fi = 1/(N(1 — e)(p(l — e) + e4>J), so the threshold value of a necessary to stabilize 
cooperation in a two-stage game oti, is: 2 


«i 


( pe ~ ecj) 

p(l - e] + 2 <j>e ~ p 


C6) 


Equation (6) tells us that oq depends only on the error rate and the ratio of the 
cost of punishing to the cost of being punished. It also says that unless punishing 
is much more costly than being punished (2 (f>e > p), the threshold strength of 
conformism necessary to maintain first-stage punishment is small and less than the 
amount of conformism necessary to stabilize 0-th stage cooperation (ao > «i ~ e). 

If we do the same analysis for stage 2, we get the following expressions for 
Ap 2 and A fi 2 : 


Ap 2 =Pi{ 1 -p 2 )[( 1 - oO/IAbz + «(2p 2 - 1)] 


C7) 
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A b 2 = bn - b NPZ = -(1 - e)N[0( 1 - Pi (1 - e)-) 

(8] 

The first term inside the square brackets in equation [8] is proportional to the 
number of individuals who did not punish during stage 1 (1 — pi(l — e)) and 
to the probability that there was at least one defector during stage 0: (1 — 
/^(l — e) N ). The quantity po{} —e] is the expected frequency of cooperators 
who did not make a mistake; thus, [p 0 (l — e]] N gives the probability that a group 
contains all cooperators who did not make a mistake — so, to get the probability 
that a group contains at least one defector, we simply subtract this probability 
from one. The second term inside the brackets is the cost of being punished 
during stage 2 for failing to punish during stage 1. If no third-stage punishers 
exist (p 3 = 0], and first-stage punishers and cooperators are initially very com- 
mon, then A fc 2 ~ — [e/V] 2 f/>. Note, the difference in payoffs, Ab 2) is a factor of eN 
smaller than A b\, but the strength of conformist transmission remains constant. 
Calculating the required size of a 2 we get: 


0.2 


Ncfre 2 

p(l — e) + ecj) 


—Ne 

P 


(9] 


Equation (9] demonstrates that 0 < a 2 < ai < oto = 2 - In this case a 2 ~ Neo.\. 

If we repeat this calculation for games with more punishment stages, we find 
that, although punishment during the last stage of the game is never favored by 
payoff-biased transmission alone, any positive amount of conformist transmission 
(oc > 0} will, for some finite number of stages, overcome payoff-biased trans- 
mission and stabilize punishment. For any value i (i > 0], the amount of con- 
formist transmission required to stabilize punishment at the i-th stage is: 


<t>e{Ne) 1 -' 

p(l — e) + e(j){\ + [Ne]' -1 ] 


[ 10 ] 


Equation [10] shows that the minimum amount of conformism necessary to 
stabilize punishment during the last stage, a ; , gets smaller and smaller for greater 
values of i [assuming e < 1/N). 

Once conformist transmission overcomes payoff-biased transmission and 
stabilizes punishment at stage i, punishment at the stage i — 1 will be stabilized 
because nonpunishers at stage i — 1 will be punished by frequent punishers 
during stage i. Once punishing strategies are common and stable at stage i— 1 , 
frequent punishers at i — 1 will cause payoff-biased transmission to favor the 
prosocial variant at stage i — 2. In most cases, a combination of punishment and 
conformist transmission will eventually stabilize cooperation at stage 0. 
However, if C is sufficiently greater than Np(l — e], then stable punishment at 
stage 1 will not be able to overcome the costs of cooperation at stage 0, and 
cooperation will not be maintained, despite stable, high-frequency first-stage 
punishers. 
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Formal Stability Analysis 

A more rigorous local stability analysis of the complete set of recursions supports 
the heuristic argument just given. Consider the set of i + 1 difference equations 
where Ap ; (;' = 0, 1 , . . . , i; see the Appendix] provides the dynamics of the be- 
havioral traits at each stage. The cooperative equilibrium point (po=l, 
p\ = \, . . . , pi=Y) is locally stable under two distinct conditions: 

Stability Condition n 

When i > 0 and C < p(l — e]N + (eN)V/), the cooperative equilibrium is locally 
stable when: 


d 0 = -a + (1 - e)(l - oQMiNey < 0 (1 1) 

where [}= 1/(N(1 - e)((l - e) +e^]). First, note that if a = 0, the cooperative 
equilibrium is never stable because all the parameters involved are always pos- 
itive. However, as long as a is positive and e < 1 /N, then the system of equations 
will be stable for some finite value of i. Substituting in the value of /i, and solving 
equation (1 1) for a, we find that the minimum value of a is: 

«> , 02 ) 

p(l - e) + e0(l + W 1 ) 

which is the same value, given in equation (10), derived using a less formal 
argument. 


Stability Condition 2 

However, if C > p(l — e]N + (eN)V/> and i > 0, then the cooperative equilibrium 
is stable when: 


d 0 = -« + (1 - a)(l - e)jS(C - (1 - e)Np) < 0 (13) 

If we then solve this for the values of a that create a stable cooperative equi- 
librium, we find: 


jg(l -g)(C- (1 -e)Np) 

' l+j8(l-*)(C-(l- e )Np) 

Under stability condition 2, [}= 1/(C(1 — e)), so: 3 


(14) 


l-[Np(\-e )/C] 
‘ 2 - [Np( 1 - e)/C] 


(15) 


The term Np{\ — e)/C is always between zero and one, so the required a is 
always less than This means that, even when the expected costs of being 
punished by everyone does not exceed the cost of cooperation (or the cost saved 
by defecting), the cooperative equilibrium can still be favored. Intuitively, this is 
the case in which conformist transmission and punishment combine to overcome 
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the cost of cooperation. As with the previous condition, however, it is con- 
formist transmission that stabilizes i-th stage punishment, which stabilizes first- 
stage punishment. 

At first, stability condition 2 may seem strange, but the world is seemingly 
full of cases in which the costs of being punished seem insufficient to explain the 
observed degree of cooperation. Hence, this may illuminate such things as why 
Americans pay too much in taxes (i.e., more than they should assuming most 
people pay because they fear punishment; Skinner and Slemrod, 1985], why 
Americans wait in line, why the Ache share meat (Kaplan and Hill, 1985], and 
why people bother going to the voting booth (Mueller, 1989] — all of which 
seem overly cooperative, given the expected penalty. As we show, this may be 
important from a cultural group selection perspective because groups that 
minimize the costs of punishing and being punished (p and $), while still main- 
taining cooperation, will do better than those that rely heavily on punishment to 
maintain cooperation. 


Once Cooperation Is Stabilized, It Can Spread by 
Cultural Group Selection 

By itself, the present model does not provide an explanation for human coop- 
eration. We have shown that, under plausible conditions, a relatively weak 
conformist tendency can stabilize punishment and therefore cooperation. How- 
ever, noncooperation and nonpunishment are also an equilibrium of the model, 
and we have given no reason, so far, why most populations should stabilize at the 
cooperative equilibrium rather than the noncooperative equilibrium. However, 
when there are multiple stable cultural equilibria with different average payoffs, 
cultural group selection can lead to the spread of the higher payoff equilibrium. As 
we have demonstrated, cultural evolutionary processes will cause groups to exist 
at different behavioral equilibria. This means that different groups have differ- 
ent expected payoffs (due to different degrees of economic production, for ex- 
ample]. The expected payoff of individuals from cooperative groups is 
b « (1 — e] [B — C — eN{(f> + p(l + i]], while the expected payoff of individuals in 
noncooperative/nonpunishing groups is zero. Thus, cooperative groups will have 
a higher average payoff as long as the benefits of cooperation are bigger than the 
costs of cooperation and punishment. The combination of conformism and 
payoff-biased transmission must also be strong enough to maintain stable co- 
operation in the face of migration between groups. Such persistent differences 
between groups creates the raw materials required by cultural group selection. 

Cultural group selection can operate in a number of ways to spread proso- 
cial behaviors. Cooperative groups will have higher total production and con- 
sequently, more resources that can support more rapid population growth 
relative to noncooperative groups. Or cooperative groups may be better able to 
marshal and supply larger armies than noncooperative groups and hence be more 
successful in warfare and conquest. However, although these factors may be 
important (see Bowles, 2000], another, slightly subtler, cultural group selec- 
tion process may also be significant. Payoff-biased imitation means people will 
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preferentially copy individuals who get higher payoffs. The higher an individual's 
payoff, the more likely that individual is to be imitated. If individuals have 
occasion to imitate people in neighboring groups, people from cooperative pop- 
ulations will be preferentially imitated by individuals in noncooperative pop- 
ulations because the average payoff to individuals from cooperative populations 
is much higher than the average payoff of individuals in noncooperative popu- 
lations. Boyd and Richerson (2000] have shown that, under a wide range of 
conditions (and fairly quickly], this form of cultural group selection will deter- 
ministically spread group-beneficial behaviors from a single group (at a group- 
beneficial equilibrium] through a meta-population of other groups, which were 
previously stuck at a more individualistic equilibrium. 


Culturally Evolved Cooperation May Cause Genes for 
Prosocial Behavior to Proliferate 

Once the cooperative equilibrium becomes common, it is plausible that natural 
selection acting on genetic variation will favor genes that cause people to co- 
operate and punish — because such genes decrease an individual’s chance of 
suffering costly punishment. This could arise in many ways. Individuals might 
develop a preference for cooperative or punishing behaviors that increases their 
likelihood of acquiring such behaviors. Or, alternatively, natural selection might 
increase the reliance on conformist transmission, making people more likely to 
acquire the most frequent behavior. 

Here, we analyze the case in which the probability of mistakenly defecting 
or not punishing, e, varies genetically. We assume that cultural evolution is much 
faster than genetic evolution, which implies that the population exists at a 
culturally evolved cooperative equilibrium. Further assume that while most 
individuals still make errors at the rate e, rare mutant individuals have a slightly 
different error probability of e'(— e — e), where s is small (|e| <«]■ If we assume 
that an individual’s average payoff, b, is proportional to her average genetic 
fitness, then we can ask whether prosocial mutants will spread. The expected 
fitnesses for the two types, F and F m (“m” for mutant], and the difference 
between them, A F, are as follows (assuming i > 0]: 

F « (1 - e)(B - C - eNW + p( 1 - e](t + 1)], 

F m « B(1 - e) - C(1 - e'] - N (ect> + e' p(\ - e](i + 1]], 

A F = F m - F = e(Np(i + 1] - C] (16] 

When AF is positive, prosocial genes can invade. If C < (1 — e)Np + (eN]V/) 
(stability condition 1], then C is always less than Np( 1 — e)(i+ 1], and prosocial 
genes are always favored. Once at fixation, these prosocial genes cannot be in- 
vaded by more error-prone, anti-social, individuals. 

In stability condition 2, where C > (1 — e)Np + (e/V]V/>, prosocial genes are 
favored (for i > 0] when: 

CMQV C 

Np( 1 - e] Np( 1 - e] 


<i+l 


d 7) 
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which is a wide range, since the smallest possible value of i is 1 . However, there 
exists a range of conditions in which culturally evolved cooperation is stable, but 
prosocial genes cannot invade — in fact, anti-social genes (genes favoring more 
mistakes) may invade. This occurs when (for i > 0): 


< Np( 1 - e) 
No 

prosocial 


Stability 


(18) 


When condition (18) holds, cultural transmission will stabilize cooperation, but 
prosocial genes will not be able to invade — instead, anti-social genes will be 
favored (i.e., e is negative). Note, however, that the minimum value of a for this 
condition to exist requires a > 0.333, which occurs when j — 1 . Generally, we 
believe a is much smaller than this, but we will await the verdict of future em- 
pirical work. Interestingly, this anti-social invasion is likely to occur in the groups 
most favored by cultural group selection — those who maximize group payoff by 
minimizing punishment costs (and i), without destabilizing cooperation. Un- 
fortunately, anti-social invasion will decrease average payoffs and may eventually 
destabilize cooperation. Further work on this gene-culture interaction will re- 
quire coevolutionary models that combine both cultural and genetic evolu- 
tionary processes (perhaps using quantitative traits) and particularly the cultural 
group selection process we have described. 

As we have begun to model it here, prosocial genes are not strongly selected 
against in noncooperative populations because error making, in terms of mis- 
taken cooperation and punishment, occurs only when individuals adopt prosocial 
traits — defectors do not mistakenly cooperate. So, if the world is a mix of co- 
operative and noncooperative populations, prosocial genes will be favored in a 
wide range of circumstances in cooperative populations and will be compara- 
tively neutral in noncooperative populations. It is possible that incorporating 
defector errors, in the form of mistaken cooperation or punishment, may affect 
this prediction. Furthermore, cooperation may not be a dispositional trait of 
individuals, but rather a specific behavior or value tied only to certain cultural 
domains. Some cultural groups, for example, may cooperate in fishing and house- 
building but not warfare. Other groups may cooperate in warfare and fishing but 
not house-building. Such culturally transmitted traits would have the form 
“cooperate in fishing,” “cooperate in house-building,” and “do not cooperate 
in warfare,” rather than the more dispositional approach of simply “cooperate” 
versus “do not cooperate.” If this is the case, then the migration and spread 
of prosocial genes becomes more difficult. As prosocial genes spread among 
groups with different stable cooperative domains, individuals with such genes 
would be more likely to mistakenly cooperate in noncooperative cultural do- 
mains. For example, in cultures where people cooperate in fishing but not 
warfare, individuals with prosocial genes may be more likely to mistakenly 
cooperate in warfare (and pay the cost), as well as less likely to mistakenly defect 
in cooperative fishing. We intend to pursue those avenues in subsequent work. 
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Conclusion 

We have done three things in this chapter. First, we have shown that, if humans 
possess a psychological bias toward copying the majority, as well as a bias toward 
imitating the successful, then cultural evolutionary processes will stabilize co- 
operation and punishment for some finite number of punishment stages. Second, 
we discussed how, once cooperation is stable, a particular form of cultural group 
selection is likely to spread these group-beneficial cultural traits through human 
populations. And finally, we have demonstrated that prosocial genes, which 
cannot otherwise spread, can invade in the wake of these cultural evolutionary 
processes, under a wide range of conditions. 

APPENDIX 
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NOTES 

We would like to thank Natalie Smith, Herbert Gintis, and the anonymous re- 
viewers for their assistance and suggestions in preparing this chapter. 

1 . Two other explanations for cooperation go by the handles by-product mutu- 
alism (Brown, 1983) and group selection (Sober and Wilson, 1998). In by-product 
mutualism, individuals who “cooperate” get a higher payoff (have a higher expected 
fitness) than noncooperators. The cooperative contribution to the fitness of others is 
simply a by-product of narrow self-interest. That is, in the process of helping myself, 
I also help you “by accident.” Hence, although this situation may abound in nature, 
it is not the situation we are interested in (and not cooperation by many definitions). 
And, while genetic group selection may explain some cooperation in nature (e.g., 
honeybees; see Seeley, 1995), we believe that gene flow rates between human pop- 
ulations, relative to selection, are too high to maintain the required variation between 
groups (Richerson and Boyd, 1998). 

2. Note, under a small range of conditions, when C > N(p(l — e) + e<(>), the 
system can still remain stable. Under these conditions, however, ft becomes 
1 /C(l — e). For simplicity, we leave this nuance until later in the chapter. 

3. Actually, there is a tiny range of (Np(l — e) + (J>(eN) 1 ) < C < (Np(l — e) + 
Ncpe) under which ft still equals 1/(N(1 — e)(</>(l — e) + e</>)). Nothing particularly 
interesting happens in this range, so we will not discuss it. Note, if i= 1, the range 
is nonexistent. 

4. If conformist transmission alone can stabilize cooperation without any 
punishment (i = 0), then A F < 0, and prosocial genes will never spread. 
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1 1 Can Croup-Functional 

Behaviors Evolve by Cultural 
Group Selection? 

An Empirical Test 

With Joseph Soltis 


Many anthropologists explain human behavior and social institu- 
tions in terms of group-level functions (Rappaport, 1984; Lenski and Lenski, 
1982; Harris, 1979; Radcliffe-Brown, 1952; Aberle, Cohen, Davis, Levy, and 
Sutton, 1950; Malinowski, 1984 [1922]; Spencer, 1891]. According to this view, 
beliefs, behaviors, and institutions exist because they promote the healthy 
functioning of social groups. Such functionalists believe that the existence of an 
observed behavior or institution is explained if it can be shown how the behavior 
or institution contributes to the health or welfare of the social group. Most 
functionalists in anthropology have not explained how group-beneficial beliefs 
and institutions arise or by what processes they are maintained [Turner and 
Maryanski, 1979]. When functionalists do provide a mechanism for the gener- 
ation or maintenance of group-level adaptations, it is usually in terms of selection 
among social groups. 1 Functionalists believe that societies have many functional 
prerequisites. Social groups whose culturally transmitted values, beliefs, and 
institutions do not provide for these prerequisites become extinct, leaving only 
those societies with functional cultural attributes as survivors. We refer to this 
process as “cultural group selection” because it involves the differential survival 
and proliferation of culturally variable groups. 

Cultural group selection is analogous to genetic group selection but acts on 
cultural rather than genetic differences between groups. This distinction is im- 
portant. We will argue that cultural variation is more prone to group selection 
than genetic variation and that this may explain why human societies, in contrast 
to those of other animals, are frequently cooperative on scales far larger than kin 
groups. More generally, recent theoretical work on the processes of cultural 
evolution shows that there are many parallels between cultural and genetic 
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evolution but also some fundamental differences (Durham, 1991; Boyd and 
Richerson, 1985; Cavalli-Sforza and Feldman, 1981; Pulliam and Dunford, 1980]. 
To date, empirical investigations focused on these processes are few (but see, e.g., 
Cavalli-Sforza, Feldman, Chen, and Dornbusch, 1982]. In addition to conducting 
empirical studies specifically designed to investigate these processes, it is possible 
to use many of the data collected by social scientists for other purposes. Here we 
use a small part of the very rich ethnographic record produced by anthropologists 
to test the empirical plausibility of the process of cultural group selection. 

As emphasized by Campbell (1965, 1975, 1983], cultural group selection 
requires that (1] there be cultural differences among groups, (2] these differ- 
ences affect persistence or proliferation of groups, and (3] these differences be 
transmitted through time. If these three conditions hold, then, other things being 
equal, cultural attributes that enhance the persistence or proliferation of social 
groups will tend to spread. There is no guarantee, however, that this process will 
be sufficiently powerful to overcome other social processes that act to produce 
other outcomes. There are two problems with cultural group selection as an 
explanation for the existence of group-beneficial traits: maintenance of variation 
among groups and rate of adaptation. 

Group-functional explanations may be in conflict with the fact that human 
choices are at least partly self-interested. To the extent that they can evaluate 
alternative beliefs and attitudes, self-interested organisms should adopt only 
beneficial attitudes and beliefs and reject those that are individually harmful. 
Thus, beliefs that are costly to the individual should diminish, while beliefs that 
are beneficial to individuals should spread. Extensive theoretical analysis suggests 
that group selection can counteract this process only if groups are very small and 
migration among groups is very limited (Eshel, 1972; Levin and Kilmer, 1974; 
Wade, 1978; Slatkin and Wade, 1978; Boorman and Levitt, 1980; Wilson, 1983; 
Aoki, 1982; Rogers, 1990]. As a result, most evolutionary biologists and social 
scientists influenced by them (e.g., Chagnon and Irons, 1979] reject functionalist 
explanations. 

Furthermore, Hallpike (1986] has argued that group extinction does not 
occur often enough to justify functionalist explanations. Group selection works 
by eliminating those societies that have deleterious practices or institutions. If it 
takes a particular number of extinctions to eliminate a deleterious ritual form, 
then it will take a greater number to eliminate the deleterious ritual form and a 
deleterious marriage practice. Still further extinctions will be required to cause 
other aspects of the society to become adaptive. Hallpike argues that human 
societies do not have high enough extinction rates for group selection to cause 
many different attributes to be adaptive at the group level simultaneously. 

In the face of these objections, is there any justification for taking group- 
functional hypotheses seriously? Here we describe a theoretical model and 
present supporting data which show that a role for cultural group selection 
should not be ruled out. Boyd and Richerson (1985, chs. 7 and 8, 1990a, b] have 
analyzed mathematical models of group selection acting on culturally transmitted 
variation and have shown that cultural group selection will work if certain key 
assumptions are met. Ethnographic data from Papua New Guinea and Irian Jaya 
give credence to some of the assumptions that underpin the group-selection 
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model. These data also allow us to estimate an upper bound on the rate of 
adaptation that could result from group selection. We argue that these data 
suggest that group selection is too slow to be used to justify the common practice 
of interpreting as group-beneficial the detailed aspects of particular cultures. 
However, the data do not exclude the possibility that group selection may account 
for the gradual evolution of some group-level adaptations, such as complex 
social institutions, over many millennia. 


How Group Selection Can Work 

We begin with the premise that individuals acquire various skills, beliefs, atti- 
tudes, and values from other individuals by social learning and that these “cul- 
tural variants,” together with their genotypes and environments, determine their 
behavior. To understand why people behave as they do in a particular envi- 
ronment, we must know the skills, beliefs, attitudes, and values that they have 
acquired from others by cultural inheritance. To do this, we must account for 
the processes that affect cultural variation as individuals acquire cultural traits, 
use the acquired information to guide behavior, and act as models for others. 
What processes increase or decrease the proportion of persons in a society who 
hold particular ideas about how to behave? Here we will consider two kinds of 
processes: biased cultural transmission and selection among social groups. 

Biased cultural transmission occurs when individuals preferentially adopt 
some variants relative to others. Individuals may be exposed to a variety of 
beliefs or behaviors, evaluate these alternatives according to their own goals, and 
preferentially imitate those variants that seem best to satisfy their goals. If many 
of the individuals in a population have similar goals, this process will cause the 
cultural variants that best satisfy these goals to spread. For example, if the two 
variants are more and less restrictive forms of food taboos and individuals prefer 
the broader diet that results from the less restrictive variant, then that variant 
will spread. This process, which is important in the spread of innovations [Rogers, 
1983], often tends to cause groups living in similar environments to have similar 
behaviors. 

However, biased cultural transmission can also maintain differences between 
groups of people living in similar environments. This can occur in two ways: first, 
a belief or behavior may be more attractive if it is more widely used than the 
alternatives. Many social behaviors have this character. For example, if food 
taboos are used as ethnic markers, then in a group in which the more restrictive 
taboo predominates, individuals may choose that taboo over the less restrictive 
one because the social benefits compensate for the nutritional costs. Game theory 
suggests that many kinds of social interactions, including bargaining, contests, and 
punishment-enforced norms, will generate an astronomical number of alternative 
equilibria. Second, when individuals are unable to evaluate the merits of alter- 
native variants, they may instead use a simple rule of thumb such as adopting the 
most common variant. This conformist form of biased transmission causes the 
more common variant to increase. For example, if the majority of a group ob- 
serves the more restrictive taboo, it will tend to increase. 
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When either common-type-advantage or conformity maintains differences 
among groups, group selection can be an important force. Consider a large pop- 
ulation sub-divided into many smaller, partially isolated groups. Suppose that 
biased cultural transmission maintains cultural differences among these groups 
despite frequent contact and occasional intermarriage and that these cultural 
differences affect the welfare of the group. For example, groups in which re- 
strictive food taboos are common may tend to harvest game at approximately 
the maximum sustainable yield, while groups in which less restrictive taboos are 
common overexploit their game resources and suffer significantly poorer nutri- 
tional status as a result. Further suppose that social groups are occasionally 
disrupted and their members dispersed to other local groups and that the rate 
at which this occurs depends on the overall welfare of the group. Such disrup- 
tion and dispersal may be the result of population decline, social discord, or the 
actions of aggressive neighbors. Poor nutritional status will contribute to these 
risks. Thus, according to our hypothetical example, groups with less restric- 
tive food taboos will, on average, be more likely to be broken up and dispersed. 
Finally, suppose that as some groups decline and disappear, other groups grow 
and eventually divide, forming new groups, and that the rate at which this occurs 
increases with the overall welfare of the group. Thus, the growing, dividing 
groups will tend to have more restrictive food taboos than declining ones, and 
restrictive food taboos will tend to spread as a result of selection among groups. 
Others have proposed at least implicitly similar models (e.g., Peoples, 1982; 
Divale and Harris, 1976; Irons, 1975). 

This model of group selection differs from those analyzed in population 
biology in that biased transmission maintains variation among groups. Biologists 
have been concerned with whether group selection could allow the evolution of 
altruistic behavior. In these models, natural selection acts against altruistic be- 
havior in every group, and this selection process tends to reduce variation among 
groups. The only process creating variation among groups is genetic drift, a very 
weak force. Thus, group selection can have little effect because groups are ge- 
netically very similar. In the model outlined here, it is assumed that various 
forms of biased transmission, potentially very strong individual-level forces, act 
to maintain differences among groups and group selection can predominate. 

The form of group selection just outlined can be a potent force even if 
groups are usually very large. For a favorable cultural variant to spread, it must 
become common in an initial subpopulation. The rate at which this will occur 
through random driftlike processes (Cavalli-Sforza and Feldman, 1981) will be 
slow for sizable groups (Lande, 1986). However, this need occur only once. 
Thus, even if groups are usually large, occasional population bottlenecks may 
allow group selection to get started. Similarly, environmental variation in even a 
few subpopulations may provide the initial impetus for group selection. Some 
environments may lead groups to adopt group-beneficial traits because they are 
also individually advantageous. These practices may then spread by group selec- 
tion into environments where they have only a group advantage. For example, 
restrictive food taboos may arise in a very heterogeneous environment in which 
it is important for individuals to specialize in narrow-range food-procurement 
strategies and only later spread by group selection to less heterogeneous 
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environments where they mainly function to protect resources against the 
tragedy of the commons. 

Unlike many genetic models, this form of group selection does not require 
that the people who make up groups die during group extinction. All that is 
required is the disruption of the group as a social unit and the dispersal of its 
members throughout the metapopulation. Such dispersal has the effect of cul- 
tural extinction, because dispersing individuals have little effect on the frequency 
of alternative behaviors in the future; in any one host subpopulation, they will be 
too few to tip it from one equilibrium maintained by convention or conformity 
to another. 

Cultural group selection is very sensitive to the way in which new groups are 
formed. If new groups are mainly formed by individuals from a single preexisting 
group, then the behavior with the lower rate of extinction or higher level of 
contribution to the pool of colonists can spread even when it is rare in the 
metapopulation. If, instead, new groups result from the association of individuals 
from many other groups, group selection cannot act to increase the frequency of 
rare strategies. 


Empirical Evidence 

To justify using this model of cultural group selection we need data that allow us 
to answer three questions: 

1 . Do groups suffer disruption and dispersal at a rate high enough to 
account for the evolution of any important attributes of human 
societies? 

2. Are new groups formed mainly by fission in groups that avoid ex- 
tinction? 

3. Are there transmissible cultural differences among groups that af- 
fect their growth and survival, and do these differences persist long 
enough for group selection to operate? 

To address these questions we present data on group extinction rates, group 
formation, and cultural variability drawn from the ethnographic literature of 
Irian Jaya and Papua New Guinea. We have chosen this area because it offers 
high-quality ethnographic descriptions of peoples that had not been pacified by 
a colonial administration. Colonialism is suspected by some to increase rates of 
intergroup conflict in stateless societies, casting doubt on data from areas like the 
American Plains, where contact predated good ethnography. New Guinea is 
unique in the amount of good ethnography obtained within a few years of first 
contact with complex societies. We have focused on pre-state societies because 
they are characteristic of more of human history than more complex societies, 
and the basic institutions of human societies evolved under stateless conditions. 

We have made an effort to sample as many ethnographies as possible, fo- 
cusing on those dealing with pre-contact warfare among indigenous peoples. We 
have chosen to focus on warfare only because it is a conspicuous way in which 
groups become extinct and is likely to be recorded. Even where defeat in war is 
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the proximate cause of an extinction, a variety of other factors may have pre- 
cipitated the event by causing the defeated group to decline in numbers. Ex- 
tinction through war may be the common fate of groups that have declined for 
some other reason. 

We define a group as a territorial population that can conduct warfare as a 
unit. An extinction is said to occur when (1} all members of a group are killed or 
[2) members of a group are assimilated into another group either wholly or in 
part. When a group is routed from its territory but remains intact as a social unit 
[or its fate is unknown), then a forced migration, not an extinction, is said to 
have occurred. 

Croup Extinction 

To estimate the rate of group extinction for a region, three types of information 
are needed: [1) the number of extinctions, (2) the number of years over which 
the extinctions took place, and [3) the number of groups among which the 
extinctions took place. We were able to assemble this information for five re- 
gions in Irian Jaya and Papua New Guinea. 


The Mae Enga 

The Mae Enga live in the Central Western Highlands, where population density 
averages 40 to 43 persons/km 2 but reaches densities of over 100 persons/km 2 
[Meggitt, 1962:158, 1977:1). The immediate causes of war [Meggitt, 1977:13) 
are land disputes [58 percent), other property disputes [24 percent), homicide 
[15 percent), and problems related to sexual jealousy [3 percent). Meggitt re- 
corded a 50-year warfare history for 14 Mae Enga clans. In the 29 conflicts for 
which the outcome was known, there were five extinctions. Extinctions did not 
result from the killing of all group members; routed clan members were forced 
to disperse and find refuge among other clans, often with kin [1977:15, 25-27). 
There is evidence that these immigrants became culturally assimilated into their 
host group, usually within a generation [Meggitt, 1965:31-35). Rapid assimila- 
tion occurred because true clan members received unqualified land rights, as well 
as economic, ritual, and military aid. As Meggitt [1977:190) notes, “Members of 
defeated and dispersed groups who have gone to live elsewhere have good po- 
litical and economic reasons not to draw attention to their immigrant status but 

instead try for relatively rapid absorption into the host clan In consequence, 

the identities of extinguished clans or subclans are soon lost to public knowledge 
and in time such groups drop out of the genealogies of their former phratries.” 


The Maring 

The Maring live in the Central Highlands, an area of relatively low population 
densities, averaging less than 20 persons/km 2 [V ayda, 1971 :22). Wars are usually 
triggered by a murder or attempted murder [56 percent of cases). The remaining 
44 percent are fought over land, women, or theft [1971:4). Vayda’s warfare 
history concerns 32 clan-clusters and autonomous clans and has a depth of about 
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50 years (Andrew Vayda, personal communication). He mentions 14 wars in 
which victims were routed from their territories. Only in one case was there a 
clear extinction; the other groups eventually returned. However, in two of these 
cases routed clans reclaimed their territory only with the help of the Australian 
police and probably would have become extinct otherwise. Rappaport (1 967 :26) 
explains that members of vanquished groups who find refuge in another group 
do not maintain their autonomy: “the de facto membership of the living in 
groups with which they have taken refuge is converted eventually into de jure 
membership. Sooner or later the groups with which they have taken up resi- 
dence will have occasion to plant rumbin, thus ritually validating their connec- 
tion to the new territory and their new group.” 

The Mendi 

The Mendi live in the Southern Highlands, where population density is 18 per- 
sons/km 2 (Meggitt, 1965:272). Ryan (1959) describes, for a 50-year period, the 
history of clan degeneration, extinction, and new group formation for a group 
of nine clans known as the Mobera-Kunjop. In this period there were three clan 
extinctions. In two cases, the clans were routed by warfare and absorbed by other 
groups; in the third a degenerating clan was eventually absorbed by another clan. 

In two cases, vanquished groups did not suffer disruption but managed to 
remain functioning as an intact subclan in their host group. Ryan (1959:271) 
suggests that such accretionary subclans eventually become assimilated into their 
host clan: “The refugee group, consisting of sub-clan brothers and their families, 
may be large enough to assume the immediate status of a subclan. . . . Once the 
people have been accepted, granted land, and have settled down, there is almost 
no further differentiation made between them and the original subclans.” How- 
ever, individual nonagnates suffer discrimination from members of their host 
clans (Ryan, 1959). They are less likely to receive bridewealth support (which 
normally comes from fellow subclan members) than are true group members, and 
therefore refugees have reason to want to assimilate into their host group: “Al- 
though it is asserted that acceptance is complete . . . marriage figures indicate that 
non-agnatic men tend to marry later than agnatic clan members, more of them 
marry only once, and more of them have only one wife at a time” (p. 269). 

The Fore and Usufura 

Berndt (1962) recorded detailed descriptions of war involving groups in four 
adjacent linguistic regions of the Eastern Highlands — the Fore, the Usufura, the 
Jate, and the Kamano. Fore population density is approximately 1 5 persons/km 2 
and that of the Usufura 27 persons/km 2 (Berndt, 1962:20). No values are given 
for the other linguistic groups. Berndt recorded one extinction during the 10- 
year period preceding his research. The group was routed in warfare and dis- 
persed into several different districts in the area. The number of groups involved 
is slightly ambiguous; Berndt indicates that his warfare data are most complete 
for only 8 districts in the area but mentions some 24 districts in his accounts of 
warfare. 
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The Tor 

The Tor region is located on the northern coast of Irian Jaya (Oosterwal, 1961). 
No density figures are provided. Oosterwal recorded a 40-year history for the 26 
tribal territories in the Tor region. Four tribes suffered extinction either through 
peaceful absorption, military defeat and dispersal, or outright extermination 
(Oosterwal 1961:21-26). In one of the extinctions, Oosterwal is clear about the 
cultural assimilation of the extinct group: “Formerly the Mander language was 
only spoken by the Mander, but since the Foja have lived together with the 
Mander, they have adopted the Mander language entirely. Save for a small 
number of words, these Foja do not recollect any more of their own language. 
Their kinship terminology is also identical with that of the Mander” (p. 23). 

Table 11.1 summarizes extinction rates for the five regions for which there 
were enough data to compute such estimates. We assume that the number of 
groups remains constant, which means that each extinction is followed by an 
immediate recolonization. To the extent that this assumption is wrong, ex- 
tinction rates will be higher. We found no ethnographies that yielded an ex- 
tinction rate of zero. In our sample, the percentage of groups suffering extinction 
each generation ranges from 1.6 percent to 31.3 percent. 

It seems likely that other areas in New Guinea had similar group extinction 
rates. There is mention of group extinction in 54 percent (15/28) of the societies 
sampled. This is no doubt an underestimate, because the failure to mention an 
extinction in an ethnographic account of warfare does not necessarily mean that 
extinctions never occurred. In 89 percent (25/28) of the societies sampled, there 
is mention of either group extinction or forced migration (see table 11.2). The 
near ubiquity of extinction and forced migration in the ethnographic record 
suggests that high rates of extinction were common throughout Papua New 
Guinea and Irian Jaya before pacification. 


New Group Formation 

Group selection is most effective when new groups are made up of members of 
a single existing group rather than of members of many different groups. If new 
groups are formed when a single group generates a daughter group from among 


Table in Summary of group extinction rates for five regions of Papua 
New Guinea and Irian Jaya 


Region 

Groups 

Extinctions 

Years 

Percentage of 
groups extinct 
every 25 years 

Source 

Mae Enga 

14 

5 

50 

17.9 

Meggitt (1977) 

Maring 

32 

1-3 

50 

1. 6-4.7 

Vayda (1971) 

Mendi 

9 

3 

50 

16.7 

Ryan (1959) 

Fore/Usufura 

8-24 

1 

10 

31.3-10.4 

Berndt (1962) 

Tor 

26 

4 

40 

9.6 

Oosterwal (1961) 
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Table 11.2. Mentions of group extinction and forced migration in Papua 
New Guinea and Irian Jaya 


People 

Extinction 

Migration 

Source 

Mae Enga 

+ 

- 

Meggitt (1977:14) 

Huli 

- 

- 

Glasse (1959) 

Melpa 


+ 

Strathern (1971:55-56, 67) 

Raiapu Enga 

+ 

+ 

Waddel (1972:37, 186, 263-65) 

Wola 

+ 

+ 

Sillitoe (1977:79) 

Maring 

+ 

+ 

Vayda (1971:11-13) 

Ok 

+ 

+ 

Morren (1986:266-67, 272-73, 278-79) 

Kuma 

+ 

+ 

Reay (1959:7, 27, 32) 

Chimbu 

- 

+ 

Brown and Brookfield (1959:41, 61, 263-65) 

Usufura 

- 

+ 

Berndt (1962:242) 

Jate 

+ 

+ 

Berndt (1962:253, 260-61) 

Fore 

- 

+ 

Berndt (1962:236, 251, 257) 

Auyana 

+ 

+ 

Robbins (1982:213-14) 

Kukukuku 

- 

+ 

Blackwood (1978:102) 

Gahuku 

- 

+ 

Read (1955:253-54) 

Arapesh 

+ 

+ 

Tuzin (1976:63) 

Abelam 

- 

+ 

Lea (1965:196, 205) 

Mailu 

- 

+ 

Saville (1926) 

Kiwai 

+ 

+ 

Landtman (1970[1927]:148-49, 204) 

Dugum Dani 

+ 

+ 

Heider (1970:119-22) 

Ilaga Dani 

- 

+ 

Sillitoe (1977:77) 

Bokondini-Dani 

- 

+ 

Sillitoe (1977:76) 

Jale 

- 

+ 

Koch (1974:79) 

Kapauku 

- 

- 

Pospisil (1963) 

Tor 

+ 

+ 

Oosterwal (1961:21-26, 48) 

Jaqai 

- 

- 

Boelaars (1981) 

Marind-Anim 

+ 

- 

Ernst (1979:36) 

Bena Bena 

+ 

- 

Langness (1964:174) 


its own members, then the daughter group will preserve the cultural variants 
common in the mother group. Cultural variants that facilitate daughter-group 
formation will become more common in the region as a whole. 

Societies in Irian Jaya and Papua New Guinea are characterized by a seg- 
mentary social system (Langness, 1964). When members of a social group be- 
come too numerous, the group may split into two similar groups. Conversely, 
when members of a social group become too few, they may be absorbed by 
another group at a lower segmentary level (Brown, 1978:184-185, 187-188). 
There are numerous anecdotal accounts of new group formation (e.g., Brown 
and Brookfield, 1959:57; Sillitoe, 1977:79; Vayda, 1971:17; Morren, 1986:269- 
270), but Meggitt (1962, 1965) and Ryan (1959) provide the most detailed 
descriptions of new group formation in two highland societies. 

The Enga have a nested hierarchy of patrilineal descent groups. The phratry 
is the most inclusive, followed by the clan, the subclan, the patrilineage, and the 
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family. Groups everywhere in the hierarchy may grow or decline over time, 
generate daughter groups, or become absorbed by other groups: “Groups may 
emerge, increase in size and take over different functions, and in doing so achieve 
higher status by becoming co-ordinate with groups that previously included 
them. In absorption, groups that are decreasing in numbers have to relinquish 
particular functions and descend to a lower level in the hierarchy. ... If the de- 
cline continues, the groups eventually vanish” (Meggitt, 1965:79}. For a group 
to achieve or retain a particular position in the hierarchy, it must contain enough 
members to perform the functions appropriate to that position. For example, 
from 1900 onward, the population of one Enga clan began increasing noticeably 
until one of its two subclans could no longer support itself on its share of land 
and began encroaching on a neighboring clan’s territory (Meggitt, 1965:62-63}. 
In skirmishes with the neighboring clan, the subclan functioned as if it were a 
sovereign clan, fighting and negotiating homicide payments independently of the 
second subclan, which was itself trying to expand in another direction. Even- 
tually members of the two subclans settled at opposite ends of the clan territory 
and behaved as members of separate clans by intermarrying. 

Meggitt (1965:78-79} gives an account of two Laiapu Enga phratries dem- 
onstrating extinction and new group formation. Each phratry was initially made 
up of four territorial clans. One expanding clan of phratry A attacked and killed 
many members of two clans of phratry B. The survivors of the two clans fled to 
other clans, and the victorious clan occupied the abandoned territory. This suc- 
cessful clan was becoming so large as to achieve subphratry status (Meggitt, 
1965:79}. Ryan (1959} gives similar accounts of group extinction and new group 
formation in the Mendi Valley. When clans become too populous, they expand 
into new territory and an off-shoot subclan occupies it. The breakaway subclan 
attains clan status as it takes on more and more functions appropriate to a clan. 


Cultural Variation among Groups 

Group extinction and group fission will lead to cultural change only if there are 
transmissible cultural differences that affect the extinction rate or the prolifer- 
ation rate. Unfortunately, there is little evidence about the amount of cultural 
variation among local groups because so few ethnographers study more than one 
local group. Furthermore, there is even less evidence about how differences 
between local groups are related to individual and group fitness in New Guinea 
ethnography, although there is quite good evidence from other areas that such 
variation exists (e.g., Kelly’s [1985] study of the causes of Nuer expansion at 
the expense of the Dinka}. Nor is there evidence about how long such differ- 
ences can persist in New Guinea groups. Archaeological and linguistic data 
from small-scale societies elsewhere document many examples of group ex- 
pansion by cultures with more effective social organization in which the dif- 
ferences persisted for many generations during the expansionary phase (e.g., 
Bettinger and Baumhoff s [1982] study of the Numic expansion from south- 
eastern California across the Great Basin}. 

Here we review three detailed studies of cultural variation among local 
groups in New Guinea. Two of these studies focus on the Mountain Ok of Papua 
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New Guinea, while the third covers the lowland Tor region of northern Irian 
Jaya. Each of these studies suggests that there is substantial cultural variation 
among local groups. 


The Mountain Ok 

The Mountain Ok occupy the center of New Guinea and are made up of nine 
“tribes” based on ethnolinguistic affinities (Morren, 1986:180-181]. Within 
these tribes are endogamous “communities,” sometimes composed of several 
exogamous clans. Only 15 percent of marriages take place between members of 
different communities (Barth, 1971:176]. 

Ritual practice and belief vary considerably from community to community. 
Ritual knowledge, surrounded by secrecy, is fully shared by only a few elders in 
each community. It is transmitted at male initiations, where it is rationed out to 
initiates in steps. Barth argues that the ritual knowledge of different communi- 
ties diverges because of error and innovation on the part of the few persons who 
control it. This produces intergroup variation in such things as the interpretation 
of important ritual symbols, the use of myths in ritual contexts, theories of 
conception, and the emphasis on symbolic constructions of human sexuality in 
ritual (Barth, 1987]. 

Sacred objects used in the initiation ritual take on different symbolic 
meaning in different communities (Barth, 1987:4-5], For example, fat from a 
wild male boar is emphatically “male” among both the Bimin-Kuskusmin and 
Baktaman of the Faiwolmin tribe. The pig’s fat is mixed with various substances 
to form a red paint that is applied to the bodies of novices, except for their 
“female” parts. In communities of the Telefolmin tribe, however, the red paint 
signifies female menstrual blood. In fact, menstrual blood is sometimes added to 
the concoction, a practice which would be “completely destructive” to the 
integrity of the Faiwolmin rituals. 

Modes in which cosmological ideas are communicated also differ among 
Ok communities. The Baktaman know almost no myths at all. A peripheral Ok 
community, the Mianmin, has a larger corpus of myths, but these are not central 
to their ritual events. The Bimin-Kuskusmin, in contrast, have an abundance of 
myths that are integrated into ritual (Barth, 1987:5-6]. 

Theories of conception differ among communities (Barth, 1987:13-15]. 
Members of the Baktaman and neighboring communities believe that children 
spring from male semen that is nourished in the mother. Telefolmin males 
believe that children are created from a fusion of male and female substances; 
females believe that a fusion of male and female substances creates only the flesh 
and blood of a child, while the female’s menstrual blood alone forms the bones. 
Other communities are characterized by still different theories of conception. 


The Faiwolmin 

Variation among communities within the Faiwolmin tribal area of the Ok region 
may provide an example of cultural variation that is linked to group survival. 
Barth (1971, cf. Morren 1986] argues that more elaborate, communal rituals and 
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specialized cult houses lead to more centralized community organization, which 
increases the survivability of the communities embracing them, and that com- 
munities with less elaborate cultural forms and more dispersed settlement pat- 
terns are more likely to become extinct. Within the Faiwolmin tribal area, ritual 
organization and specialization find their most elaborate expression in the cen- 
tralized communities (Barth, 1971:179-181). Male initiation is organized in 
seven grades through which males pass as age-sets. In western communities there 
are four such grades, and in the southeastern communities they range from four 
to one (p. 185). Different rituals take place in specialized cult houses. Most 
Faiwolmin communities contain three permanent cult houses as well as a com- 
munal men’s house. As one moves east and southward from central Faiwolmin, 
the number of cult houses declines. Most of the southeastern communities 
contain only one permanent cult house, and some perform initiations in tem- 
porary structures. 

There is also variation in social organization among Faiwolmin communities, 
following a similar west-to-east pattern of decreasing centralization (Barth, 
1971:184-186). The centralized communities of the Faiwolmin form compact 
villages around several types of semipermanent cult houses, and several exoga- 
mous clans make up an isolated, largely endogamous political unit. In the east 
the population is dispersed within the community territory, shifting household 
locations at intervals because of soil depletion or fear of sorcery. 

According to Barth, “The dispersed pattern without the cult houses . . . clearly 
organizes a smaller population for defense, and their history of displacement 
would seem to demonstrate this disadvantage” (p. 189); “the greater centraliza- 
tion clearly also offers military advantages and has resulted in conquest and terri- 
torial expansion of the more highly centralized groups in a general south-eastward 
direction” (p. 186). He argues that the elaborate rituals and the concomitant 
communal centralization were first introduced to the Faiwolmin communities 
from the northwest, and the diffusion of these cultural forms created cultural 
variation among them. Finally, selection among groups increased the frequency of 
those cultural forms conferring the highest fitness on groups (p. 188): 

The distribution of [cultural] forms is thus generated by a number of 
simultaneously partly independent processes. A process of diffusion 
from an innovation centre . . . seems to be taking place. Simultaneously, 
the organization of local cultural transmission is such that both loss and 
improvisation occur and new local variants emerge. Different ritual 
forms imply different community types; these again confront each other 
in warfare and compete and replace each other on the basis of their 
unequal defensive and offensive capacities. 

If Barth is correct, this is an example of group selection increasing the 
cultural variants that enhance group survival. He considers the alternative 
hypothesis that ecological processes explain the smaller scale of social organi- 
zation. Although he cannot completely rule out an ecological explanation, he 
clearly suggests that a ritual system that organizes more people and thus leads to 
a greater frequency of victory in violent conflicts is leading to the spread of more 
complex ritual (pp. 188-189). 
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The Tor 

Significant cultural variation also existed between tribal territories of the Tor 
region [Oosterwal, 1961}. The Tor region is divided into 26 tribal territories, but 
it has 8 separate languages (Oosterwal, 1961: appendix]. Thus, many adjacent 
tribes speak different languages, although the most common language, that of the 
Berrick, is known by members of all tribes (Oosterwal, 1 96 1 : 1 8) . Oosterwal also 
notes these differences: “the three culture areas in the Tor district are very dis- 
tinct. . . . [There are] differences in . . . kinship terminology, the kinship structure, 
the socio-religious aspect of culture, the way of counting, language-fdialect]- 
differentiations, and some aspects of material culture” (p. 46], These three 
“cultural areas,” with associated kinship terminologies, are the Berrick, the Ittik 
and Mander, and the Segar and Naidjbeedj. Tribes in “transitional zones” have 
elements of all three cultural areas, and there is variation within each area (pp. 
149-1 74]. The terminology of the Berrick tribe emphasizes the age criterion (e.g., 
MoElSi is terminologically distinguished from MoYoSi] but often ignores the 
generational criterion (e.g., MoBr and SiSo call each other by the same term] . The 
terminology of the second cultural area ignores the generational criterion to a far 
greater extent. In contrast to those of the previous two areas, cultures in the third 
region have a strong generational aspect in their terminology. There is also vari- 
ation within each of these three broad areas. For example, the cousin terminology 
of the Berrick is of the Hawaiian type (all cross and parallel cousins called by the 
same terms as those for sisters], while the Waf and Goeammer (of the same 
culture area] use the Iroquois type (FaSiDa and MoBrDa called by the same terms 
but terminologically differentiated from parallel cousins and from sisters, parallel 
cousins commonly but not always classified with sisters}. 

Although it is difficult to show that the particular group extinctions that we 
have counted for the five regions are due to persistent cultural differences, there 
is abundant evidence in New Guinea and elsewhere that cultural differences do 
lead to the success of some groups and the decline of others. For example, among 
the Fore the practice of mortuary cannibalism caused the spread of the deadly 
disease kuru. According to Durham's (1991:411-413] account of this episode, 
ritual cannibalism was originally adopted by Fore women as a response to a short- 
age of game. Nevertheless, the spread of the disease as a by-product of this ritual 
innovation threatened Fore groups with extinction until modern medical teams 
intervened. This case points up the ambiguous role of rational choice in the group- 
selection process. Individual calculation of advantage may often run counter to 
group advantage, especially when acts of cooperation are involved. Rappaport 
(1979:100} called attention to the role of the sacred in concealing group- 
advantageous traits from ready attack by selfish reason. As the Fore experience 
with kuru illustrates, traits disadvantageous to groups (and to individuals in this 
case] may sometimes be concealed in the same way. 

Knauft (1985] gives an example of an apparent group extinction in progress. 
The completely acephalous Gebusi were a small and declining group at the time 
of his study. The better-organized Bedamini, making use of the big-man style of 
political organization, were able to raid Gebusi villages, but the Gebusi were 
unable to organize an effective defense or a retaliatory response. The boundary 
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Gebusi villages most exposed to Bedamini raids were in the process of assimi- 
lating to Bedamini customs. 

Knauft (1993) also provides examples of cultural differences among seven 
culture areas along New Guinea’s south coast. He describes how the Marind- 
Anim system of mythico-religious affiliation supports intragroup peace and the 
organization of large-scale head-hunting raids against distant enemies. By con- 
trast, the Purari head-hunt among themselves and are declining relative to their 
neighbors. The existence of considerable variation at the scale of language groups 
suggests a considerable time depth for these differences. Although this variation 
occurs among larger groups than we are concerned with here, it does show that 
variation in sociopolitical organization encoded in myth and religion has a strong 
effect on group success. 

It is also important that cultural differences between groups persist on time 
scales sufficient for the operation of group selection. Although there is variation 
among local groups in New Guinea, there are no data bearing on the question of 
how long that variation persists. However, there is ample evidence for the long- 
term persistence of cultural differences among larger groups in other culture 
areas. For example, concepts such as mana and tabu typify political culture 
throughout Polynesia despite the fact that these societies have been isolated 
from each other for more than 1,000 years (Kirch, 1984). Egerton (1971) doc- 
uments the existence of important differences among four tribal groups living in 
two different types of environment, inlcuding two tribes belonging to the Bantu 
and two to the Kalenjin language groups, which have been separated for thou- 
sands of years. He notes that tribal history is more important than contemporary 
environmental circumstances in explaining most of the variation in attitudes and 
values measured in his data. The roots of the 38 languages of Western American 
Indians go back 6,500 years, and cultural differences among close neighbors with 
different cultural history have persisted for long periods (Jorgensen, 1980:109). 
Belgium is divided by a stable linguistic boundary, with a Flemish North and 
a Walloon South (van den Berghe, 1981); despite the fact that there is no to- 
pographical separation, the linguistic frontier has persisted for 2,000 years. Such 
examples from archaeology and history can be multiplied at will. While they do 
not prove that cultural differences can persist at smaller scales as required by the 
model, they indicate that this assumption is plausible. 

Discussion 

Cultural group selection can explain the evolution of group-functional behaviors 
and institutions in human societies if two conditions are met: first, there must be 
some mechanism that preserves between-group variation so that group selection 
can operate. The model described provides one such mechanism, and we have 
here tested several of the model’s basic assumptions against the ethnographic 
record to determine if those assumptions are empirically realistic. Second, group 
selection must be sufficiently rapid to explain observed patterns of cultural 
change. The data from Papua New Guinea and Irian Jaya allow us to estimate the 
maximum rate of adaptation through group selection. Thus, we can estimate a 
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minimum time period in which the group-selection process can give rise to group- 
level adaptations. Cultural changes that have occurred on a longer time scale are 
possibly the result of group selection, cultural changes that have occurred on a 
shorter time scale are unlikely to have resulted directly from group selection, but 
they may be its indirect result. For example, cultural group selection may lead to 
the evolution of property rights, which lead to efficient allocations of resources, or 
of political institutions that lead to group-beneficial decisions. 


Model Assumptions 

The data from New Guinea provide some qualified support for the model of 
group selection described. 

1. Group disruption and dispersal are common. Extinction rates per 
generation range from 2 percent to 31 percent, with a median of 
10.4 percent in the five areas for which quantitative data are 
available, and the frequent mention of extinction elsewhere suggests 
that these rates are representative. 

2. New groups are usually formed by fission of existing groups. The 
detailed picture from the Mae Enga and the Mendi is supported 
by anecdotal evidence from other ethnographies. We are not aware 
of any ethnographic report from New Guinea in which colonists of 
new land are drawn from multiple groups. 

3. There is variation among local groups, but it is unknown whether 
this variation persists long enough to be subject to group selection 
and whether this variation is responsible for the differential ex- 
tinction or proliferation of groups. 


Rates of Change 

The New Guinea data on extinction rates allow us to estimate the maximum rate 
of cultural change that can result from cultural group selection. For a given group 
extinction rate, the rate of cultural change depends on the fraction of group 
extinctions that are the result of heritable cultural differences among groups. If 
most extinctions are due to nonheritable environmental differences (e.g., some 
groups have poor land) or bad luck (e.g., some groups are decimated by natural 
disasters), then group selection will lead to relatively slow change. If most ex- 
tinctions are due to heritable differences (e.g., some groups have a more effective 
system of resolving internal disputes), then group selection can cause rapid 
cultural change. The rate of cultural change will also depend on the number of 
different, independent cultural characteristics affecting group extinction rates. 
The more different attributes, the more slowly will any single attribute respond 
to selection among groups. By assuming that all extinctions result from a single 
heritable cultural difference (or tightly linked complex of differences) between 
groups, we can calculate the maximum rate of cultural change. 

Such an estimate suggests that group selection is unlikely to lead to signifi- 
cant cultural change in less than 500 to 1,000 years. The length of time it takes 
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Table n.3. Minimum number of generations necessary to change the 
fraction of groups in which a favorable trait is common assuming a particular 
extinction rate 


Extinction rate 

Initial fraction Final fraction 

favorable trait favorable trait 1.6% 10.4% 17.9% 31% 


0.1 0.9 192 40.0 22.3 11.8 

0.01 0.99 570 83.7 46.6 24.8 


Note: Extinction rates were chosen as follows: 1.6 percent (for the Maring) is the lowest 
estimate, 10.4 percent is the median extinction rate, 17.9 percent (for the Mae Enga) is the 
estimate based on the best data, and 3 1 percent (for the Fore/Usufura) is the highest estimate. 


a rare cultural attribute to replace a common cultural attribute is one useful 
measure of the rate of cultural change. Suppose that initially a favorable trait 
is common in a fraction q 0 of the groups in a region. Then the number of gen- 
erations (t] necessary for it to become common in a fraction q t of the groups can 
be estimated (see Appendix]. The time necessary for different parameters is 
given in table 1 1.3. If we take the median extinction rate as representative, these 
results suggest that group selection could cause the replacement of one cultural 
variant by a second, more favorable variant in about 40 generations, or roughly 
1,000 years. If we take the extinction rate calculated using the best data, those 
from the Mae Enga, this time is cut roughly in half. These calculations assume 
that colonizing groups are selected at random from the population. If group 
proliferation is as selective as group extinction, then the time is again cut in half, 
reducing the substitution time (based on the median extinction rate], once again, 
from 1,000 to 500 years. Not all extinctions and new group formations result 
from heritable cultural differences. Since the New Guinea ethnographic data are 
not sufficient to estimate the extent to which cultural variation influences group 
extinctions, it is not possible to make an estimate of the actual strength of group 
selection in New Guinea. If such estimates were possible, we expect that they 
would show that actual rates are considerably below the maximum. The max- 
imum rate is nevertheless useful as an upper bound on the kinds of evolutionary 
events that cultural group selection might explain. 

Our estimate of the maximum rate of adaptation suggests that group se- 
lection is too slow to account for the many cases of cultural change that occur in 
less than 500 to 1,000 years. For example, according to Feil (1987] the arrival of 
the sweet potato in the highlands of New Guinea sometime in the eighteenth 
century led to many important cultural changes. The introduction of the horse 
to the Great Plains of North America in the 1 500s led to the evolution of the 
culture complex of the Plains Indians in less than 300 years. If the rates of group 
extinction estimated for New Guinea are representative of small-scale societies, 
cultural changes such as these cannot be explained in group-functional terms. 
There has not been enough time for group selection to have driven a single 
cultural attribute to fixation, even if that attribute had a strong effect on group 
survival. Processes based on individual decisions are likely to account for such 
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episodes of rapid evolution (see Smith and Winterhalder, 1992; Boyd and 
Richerson, 1985]. Such processes will not lead to group-functional outcomes 
except in certain special circumstances (see n. 1], It is possible that situations in 
which a trait or trait complex that increases the scale of cooperation is spreading 
such as the one Barth posits for the Faiwolmin do show rapid cultural group 
selection in progress. If the arrival of the sweet potato a few centuries ago did 
provide the subsistence basis for larger and more complex societies, we might 
expect to observe group selection in the early to middle stages of the spread of 
newly advantageous forms of social organization (Golson and Gardner, 1990; 
Fell, 1987). 

These results also suggest that group selection cannot justify the practice 
of interpreting many different aspects of a culture as group-beneficial. A given 
extinction rate will lead to slower change if many different, unrelated aspects of 
the culture affect group survival. Suppose that both beliefs about food con- 
sumption and beliefs about spatial organization affect group survival. Then, 
unless each extinction occurs in a group in which both deleterious beliefs about 
food consumption and deleterious beliefs about spatial organization are com- 
mon, some extinctions have no effect on the fraction of groups with deleterious 
beliefs about food, and some extinctions have no effect on the fraction of groups 
with deleterious beliefs about spatial organization. Thus, a given number of 
extinctions must lead to slower evolution of each character than would be the 
case if only one of the characters affected group survival. If group selection can 
cause the substitution of a single trait in 500 to 1,000 years, the rate for many 
traits will be substantially longer. We know from linguistic and archaeological 
evidence that related cultural groups that differ in many cultural attributes have 
often diverged from a single ancestral group in the past few thousand years. 
Thus, there has not been enough time for group selection to have produced the 
many attributes that distinguish one culture from another. 

It is important to understand that slow does not necessarily mean weak. 
When individual decision making is in opposition to group function in every 
group, then the relatively slow group-selection process will be too weak to favor 
group -functional behaviors. But when social interaction results in many alter- 
native stable social arrangements, then individual decision making maintains 
differences among groups. If the resulting variation is linked to group fitness, 
then group selection will proceed. For example, consider the response to an 
environmental change such as the opening of New Guinea to trade with Euro- 
peans. Initially, changes in the costs and benefits of alternative beliefs and values 
will cause rapid cultural change, soon leading to a new sociopolitical equilibrium 
in each culture. But if there are many alternative equilibria, the nature of each 
new equilibrium may depend on existing norms and values. As long as the 
resulting differences affect group survival, selection among groups will continue. 
Over a millennium or so, New Guinea societies with a better political adaptation 
to world contact will replace those with a poorer adaptation. 

Thus, it follows that these results do not preclude interpreting some aspects 
of contemporary cultures in terms of their benefit to the group. The model 
demonstrates that under the right conditions group selection can be an important 
process, and the data from New Guinea suggest that some of these conditions are 
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empirically realistic. The data also suggest that the rates of group extinction are 
high enough to cause a small number of traits with substantial effects on group 
welfare to evolve on time scales that characterize some aspects of cultural change. 
Group selection cannot explain why the many details of Enga culture differ from 
the many details of Maring culture. It might explain the existence of geographi- 
cally widespread practices that allow large-scale social organization in the New 
Guinea highlands, practices that evolved along with, and perhaps allowed, the 
transition from band-scale societies to the larger-scale societies that exist today. 

Cultural group selection provides a potentially acceptable explanation for 
the increase in scale of sociopolitical organization in human prehistory and 
history precisely because it is so slow. Scholars convinced of the overwhelming 
power of individual-level processes have real difficulty in explaining slow, long- 
term historical change. Anatomically modern humans appear in the fossil record 
about 90,000 years ago, yet there is no evidence for symbolically marked 
boundaries (perhaps indicative of a significant sociopolitical unit encompassing 
an “ethnic” group of some hundreds to a few thousand individuals] before about 
35,000 years ago (Mellars and Stringer, 1989). The evolution of simple states 
from food-producing tribal societies took about 5,000 years, and that of the 
modern industrial state took another 5,000. Evolutionary processes that lead 
to change on 10- or 100-year time scales cannot explain such slow change unless 
they are driven by some environmental factor that changes on longer time scales. 
In contrast, the more or less steadily progressive trajectory of increasing scale of 
sociopolitical complexity over the past few tens of thousands of years indeed is 
consistent with adaptation by a relatively slow process of group selection. 

These results should be interpreted with caution. It is important to re- 
member that we have estimated a maximum rate of change for group selection 
on the basis of the assumptions that observed differences among local groups are 
heritable and that they are persistent. Unless both assumptions are satisfied, 
group selection will be less important than our results indicate. It is also im- 
portant to keep in mind that we have studied only one form of group selection — 
competition among small, culturally heterogeneous groups. Other plausible 
group-selection processes might lead to more rapid change. For example, one 
cultural region may encroach upon another along a frontier, constantly capturing 
additional land and gradually expanding its domain. The Nuer and Dinka formed 
such a system before they were both overtaken by European colonists (Kelly, 
1985). In state-level societies, we have to allow for internal group selection via 
the extinction and proliferation of subgroups, such as ruling classes, interest 
groups, firms, and the like, as well as selection among states themselves (Hannan 
and Freeman, 1989). Some economists have considered business failure and 
proliferation rates sufficient to drive group selection of these units (Alchian, 
1950; Nelson and Winter, 1982). The development of collective decision- 
making institutions like bureaucracies and legislatures may permit group- 
functional behaviors to be deliberately adopted by state-level societies. These 
processes might act at a much faster rate than we have estimated on the basis of 
tribal institutions. 

In conclusion, these data suggest that group selection cannot explain rapid 
cultural change or the many differences between related cultures. However, they 
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also show that group selection, perhaps in concert with other processes, is a 
plausible mechanism for the evolution of widespread attributes of human soci- 
eties over the long run. 


APPENDIX: Time for Trait Substitution 


Assume that there are two cultural variants — deleterious and advantageous. Each 
is at a local equilibrium under the influence of within-group processes. Groups are 
connected by the mixing of individuals, and there are many such groups. Groups 
in which the advantageous variant is common never go extinct. A fraction e of the 
groups in which the deleterious variant is common suffer an extinction each gener- 
ation. The dynamics of this system are quite complicated because the frequency of 
advantageous variants within subpopulations in which that variant is common de- 
pends, to a small degree, on the frequencies of both variants in the population as 
a whole. However, if both variants are in local equilibrium, even when there is only a 
single population in which they are common, then it is roughly correct to regard the 
subpopulations as individuals and use formulas from population genetics (see Boyd 
and Richerson, 1990b for a fuller treatment). Then, if the advantageous trait is 
common in a fraction q of the groups in the region, after one generation 

q 

q (1-^1 -e) + q 
and the frequency after t generation is 

gg 

(1 - <2o)(l - e)' + q 0 

Solving this for t yields 
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generate table 11.3. 


NOTES 

We thank Philip Newman, Paul Sillitoe, Andrew Vayda, Mark Allen, and Bob 
Rechtman for help in locating data used in this analysis. Joan Silk, Timothy Earle, 
Eric Smith, Paul Allison, Lore Ruttan, Mark Jenike, Alan Rogers, Monique Bor- 
gerhoff Mulder, and an anonymous referee provided very useful comments on earlier 
drafts of this chapter. Members of the University of Bielefeld’s Center for Interdis- 
ciplinary Research project on the Biological Foundations of Human Culture provided 
a constructively critical audience for an early version (special thanks are due its 
director, Peter Weingart). Jonathan Turner convinced us that state-level institutions 
are different from tribal ones. 

Some authors (e.g., Harris, 1979) have suggested that the self-interested choices 
of individuals will result in group-beneficial behavior. However, this claim is not 
cogent — group-beneficial behavior will not result from individual choice except as a 
side effect of other processes or in certain limited circumstances. For example, many 
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authors have suggested that food taboos exist because they prevent overexploitation 
of ecological resources. To keep things simple, let us suppose that individuals must 
choose to observe a particular taboo or not and that individuals who observe this 
taboo forgo a satisfying and nutritious food item. Choosing to ignore the taboo has a 
positive effect on individuals’ own welfare and, by assumption, a negative effect on 
the welfare of the group. However, unless the group is very small, the personal effect 
will be much larger than the effect on the group, and thus choosing to ignore the 
taboo will better serve individuals’ goals, even if their goals include the welfare of the 
group. This effect is at the heart of both rational-strategy and evolutionary arguments 
against the easy development of group-beneficial behavior. The effect is not a matter 
of cognitive capacity, as writers such as Harris seem to imply. Rational strategists are 
assumed to have unlimited cognitive capacity, whereas evolved creatures are the 
products of blind selective sorting, but the essential problem is the same; both ra- 
tional strategists and evolved creatures are expected to act in their own self-interest. 

Group-beneficial behavior may result from self-interested individual choice 
under certain circumstances. First, since individual and group benefit are often cor- 
related, individual choice may often produce group-beneficial outcomes as a side 
effect [see Sugden, 1986, for several examples). Second, markets will lead to an 
“efficient” allocation of economic resources if the state or some other external au- 
thority enforces contracts, external effects such as air pollution are not present, and a 
number of other conditions are satisfied. The allocation is efficient only in the sense 
that no one can be made better off without someone else’s being made worse off — 
the distribution of wealth that results could be extremely deleterious to the survival 
of the society. Clearly, most aspects of culture are not regulated by markets or prices, 
even in contemporary societies. Third, rational planning by leaders or institutions 
may also lead to group-beneficial outcomes. While the extent to which political 
institutions can ever be modeled as acting in the common interest is debatable, it is 
clear that most aspects of culture are not the result of rational planning. Finally, 
individuals may choose group-beneficial activities if they value those activities for 
their own sake, not because they benefit the group (Margolis, 1982; Batson, 1991). 
For example, men may fight to defend the group if they value heroism in battle. 
However, one is left with explaining how men come to have such preferences — 
otherwise, the explanation is that people choose group-beneficial behaviors because 
they like to do so. Thus, we do not deny that people make group-beneficial choices. 
We are claiming that when such choices occur, they cannot be the result of mainly 
self-interested choice. 
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1 2 . Group-Beneficial Norms Can 
Spread Rapidly in a 
Structured Population 


Many culturally transmitted norms are group-beneficial (Sober and 
Wilson, 1998]: property rights encourage productive effort, rules against murder 
and assault encourage civil order, norms governing the filling of political offices 
reduce the chances of civil war, and product standards, building codes, and rules 
of professional conduct allow more efficient commerce. For most of human 
history, states were weak or nonexistent, and norms were not enforced by ex- 
ternal sanctions. Nonetheless, norms were important regulators of social order, 
and while in modern states black-letter laws also further many of the same ends 
as informal norms, the evidence is that informal custom still plays a very im- 
portant role in regulating behavior (Ellickson, 1991]. 

The persistence of group-beneficial norms is easily explained. When people 
interact repeatedly, behavior can be rewarded or punished, and such incentives 
can stabilize almost any behavior once there is consensus about what is nor- 
mative. People conform to normative behavior in order to gain rewards or avoid 
punishment. The provision of rewards and punishments can be explained in 
several ways: first, if interactions are repeated indefinitely, punishing or re- 
warding also can be normative behaviors, and violators of that norm can be 
punished or rewarded as well (Boyd and Richerson, 1992a]. Second, even if 
interactions do not go on indefinitely (or equivalently, people cannot remember 
large number of interactions], the relative disadvantage suffered by those who 
enforce social norms compared with those who do not rapidly becomes small 
as the number of interactions increases and is easily balanced by even a weak 
tendency to imitate the common type (Henrich and Boyd, 2001]. (Of course, 
strong conformism can also explain the maintenance of norms without punish- 
ment; Boyd and Richerson, 1985.] Finally, punishment may be individually 
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beneficial if it is a costly signal of an individual’s qualities as a mate or coalition 
partner (Bleige Bird, Smith, and Bird, 2001}. Several authors suggest that the 
stability of such norms explains human cultural diversity — distinct groups rep- 
resent alternative, stable equilibria in a complex, repeated “game of life” (Boyd 
and Richerson, 1992b; Binmore, 1994; Cohen, 2001}. 

The fact that group-beneficial norms can persist does not explain why such 
norms are widely observed. While punishment and reward can stabilize group- 
beneficial norms, they can also stabilize virtually any behavior (Fundenberg and 
Maskin, 1986; Boyd and Richerson, 1992a}. We can be punished if we lie or 
steal, but we can also be punished if we fail to wear a tie or refuse to eat the 
brains of dead relatives. Thus, we need an explanation for why populations 
should be more likely to wind up at a group-beneficial equilibrium than one of 
the vastly greater number of stable but non-group-beneficial equilibria. Put an- 
other way, if social diversity results from many stable social equilibria, then 
social evolution must involve shifting among alternative stable equilibria. Group- 
beneficial equilibria will be common only if the process of equilibrium selection 
tends to pick out group-beneficial equilibria. 

Currently, there are two different kinds of models of equilibrium selection, 
but neither provides a plausible explanation for the widespread existence of 
group-beneficial norms. 

Within-group models of equilibrium selection (Kandori, Mailath, and Rob, 
1993; Ellison, 1993; Young, 1998; Samuelson, 1997} consider the effects of 
random processes that act within groups to change the frequency of alternative 
behavioral strategies. In finite populations, sampling variation will affect patterns 
of interaction and replication, which in turn will lead to random fluctuations in 
the frequencies of types through time. As long as some mutation-like process 
acts to maintain variation, the probability that the population will be in any state 
will eventually converge to a stationary distribution. If mutation rates are low 
and populations are of reasonable size, most of the probability mass of the 
stationary distribution will pile up around the stable equilibrium of the deter- 
ministic dynamic model that has the largest basin of attraction. Since there is no 
necessary relationship between the size of a basin of attraction and whether it 
is group beneficial, within-group models do not predict that group-beneficial 
norms will be common. Within-group models also suffer from two other related 
problems. First, it takes a very long time for populations to shift from one 
equilibrium to another unless the number of interacting individuals is very small. 
Second, these models provide no mechanism for cumulative irreversible social 
change because populations are assumed to be in stochastic steady state, ran- 
domly wandering back and forth between alternative equilibria. 

Between-group models posit that equilibrium selection results from the 
competition between groups near alternative stable equilibria. These models 
assume that groups at more efficient equilibria are less likely to go extinct, or 
more able to compete with other groups in military or economic contests. This 
kind of group selection process leads to the evolution of group-beneficial equi- 
libria even when groups are large, and there is substantial migration between 
groups (Boyd and Richerson, 1982, 1990}. However, given observed rates of 
group extinction, the spread of group-beneficial equilibria will occur too slowly 
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to account for much observed social evolution. Calculations based on empirical 
data on the social extinction of small groups in highland New Guinea suggest 
that even though rates of extinction are appreciable, the time scale for the 
substitution of one norm by a better one is on the order of a millennium (Soltis, 
Boyd, and Richerson, 1995}. Moreover, these models also lack any mechanism 
that allows for the efficient recombination of group-beneficial innovations oc- 
curring in different groups, and thus cannot easily account for the cumulative 
nature of social change over the last 10,000 years. 

Here, we show that when the standard replicator dynamic model of evolu- 
tionary game theory is embedded in a spatially structured population, group- 
beneficial equilibria can spread rapidly and innovations can readily recombine to 
form beneficial new combinations. The basic logic of this result is simple: evolu- 
tionary game theory is applicable to human social evolution when behavioral 
strategies are transmitted by imitation, and people who have achieved high payoffs 
are most likely to be imitated. Strategies that have high average payoffs will in- 
crease in frequency, in most cases eventually leading to a stable evolutionary 
equilibrium state. If the payoff structure of social interactions leads to multiple 
stable equilibria and a population is structured, partially isolated groups can be 
stabilized at different equilibria with different average payoffs. Consequently, be- 
haviors can spread from groups at high payoff equilibria to neighboring groups at 
lower payoff equilibria because people imitate their more successful neighbors. 
Such spread can be rapid because it depends on the rate at which individuals 
imitate new strategies, rather than the rate at which groups become extinct. 

In what follows, we first derive the dynamic equations that govern replicator 
dynamics in a spatially structured population. We then show that these equa- 
tions can lead to the rapid spread of group-beneficial traits under plausible con- 
ditions. Finally, we show that this process readily leads to the recombination of 
different group-beneficial traits that arise in different populations. 


Replicator Dynamics in a Structured Population 

In many situations, people have important social interactions shaped by social 
norms with one group of people but know about the behavior, and the norms 
that regulate it, of a larger group of people. People interact every day with the 
members of their local group — they exchange food, labor, and land; aid others 
in need; marry and care for children — transactions that are regulated by social 
norms that define property rights and moral obligations. However, people also 
often know about the behavior of others in neighboring groups. They know that 
we can marry our cousins here, but over there they cannot; or anyone is free 
to pick fruit here, while there fruit trees are owned by individuals. With this kind 
of population structure, payoffs are determined by the composition of the local 
group, but cultural traits can diffuse among groups. 

To generalize evolutionary game theory to allow for this kind of popula- 
tion structure, consider a population that is subdivided into n large groups in 
which frequent social interaction occurs. Individuals are characterized by one of 
k strategies. The proportion of people in group d who have strategy i is p U h and 
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the vector of frequencies in group d is pd- Social interaction generates a payoff, 
Wi (pd) for individuals with behavior i in group d that depends on individuals’ 
own strategy and the strategies of other members of their group because fre- 
quent social interaction occurs with other group members. 

To allow for the possibility of cultural diffusion between groups, we adopt 
the following model of cultural transmission: during each time period, each 
individual from group / encounters an individual, their “model,” from group d 
with probability mdf and observes that individual’s strategy and payoff from 
social interaction during that period. We will assume that mff > E^/w^/so that 
most encounters occur within social groups. After the encounter, individuals 
may imitate the strategy of their model. 

We assume that individuals are more likely to imitate if their model has a 
higher payoff than they do. More formally, if an individual with behavior i from 
group / encounters an individual with behavior from group d, individual i 
switches to j with this probability: 

prum = i(i + iw^d) - we? /))] (i) 

where fi is a positive parameter that scales payoffs so that 0 < Pr(; I i,j) < 1 for 
all p d and p/. Equation (1) implies that individuals sometimes switch to a lower 
payoff strategy, unlike some recent derivations of replicator dynamics (Borgers 
and Sarin, 1997; Schlag, 1998; Gale, Binmore, and Samuelson, 1995). We think 
this model is preferable because it captures the effect of uncertainty about the 
payoffs of others, and because it allows diffusion between groups even when 
there are no payoff differences, a conservative feature that reduces the effect of 
population structure. 

Then the frequency of behavior i in group /, p' if, after one time period is 
given by equation (2): 

p'if = Y m A Pif E^C 1 + / W( p f) - WM)) 

d Li 

+Pid Y^PiS Ja-t IWilPj) - Wp /)])] (2) 

The first sum inside the square brackets gives the probability that an individual with 
trait i in group / remains the same, and the second sum gives the probability that 
someone who is not i initially converts to i. Some algebraic manipulation yields the 
following expression for the change in the frequency of behavior i in population/: 

P'if ~ Pif = Spif 1 ~Y\ m df 

l W/ 

+ Yl m d^d + c pid - PM i + Kmpd) - mp f m (3) 

d+s 

where Spy = ftp# (Wfp/) — Vk(p/)) is the replicator dynamic equation for strategy 
i in group / and is the canonical description of strategy dynamics in evolutionary 
game theory. Thus, when individuals imitate only members of their own group 
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C m df= 0, d + /), equation (3} says that imitation within each group causes be- 
haviors with the highest payoff relative to others in the group to increase in 
frequency — effects on average payoff within a group are irrelevant. When there 
is contact between different groups, however, the effect of a behavior on average 
group payoff can become important. The second term in equation (3) includes 
the effect of diffusion between groups that differ in trait frequency. When 
payoffs do not effect imitation (J) = 0], this term includes only passive diffusion. 
However, when individuals with higher payoffs are more likely to be imitated, 
there is a net flow of strategies from groups with high average payoff to groups 
with lower average payoff. 


How Group-Beneficial Equilibria Spread 

Next, we show how this effect can lead to the spread of group-beneficial equi- 
libria. Consider a simple model in which there are two strategies, 1 and 2. For 
example, strategy 1 might be a norm forbidding cousin marriage, while strategy 
2 is the norm allowing free choice of a spouse. Within each group, individuals 
who deviate from the common norm suffer because they are punished by other 
group members. The norm forbidding cousin marriage might lead to higher 
average payoff due to the formation of wider political alliances. We formalize 
these ideas by assuming that the payoff to an individual with behavior 1 in group 
d is Wi(pid) = 1 + s(pij — p) +gpid and the payoff to an individual using be- 
havior 2 is W 2 ^pid) = 1 +gp\d- Thus, each strategy has a higher relative payoff 
when common. The unstable equilibrium that divides the two basins of attrac- 
tion is p. The parameter s measures the magnitude of the difference in payoffs 
of the two strategies, and g measures the effect of behavior 1 on average payoff. 
We assume that g > 0, so that groups in which behavior 1 is common have higher 
average payoff. For example, a norm against cousin marriage might lead to more 
alliance formation among clans within the group. Finally, for simplicity, we as- 
sume that social groups are arranged in a ring so individuals imitate only 
members of their own group and the two neighboring groups. (So that m^/= m 
for the two neighbors of group / and zero otherwise.] 

For a novel group-beneficial trait to evolve, two things must occur. First, it 
must become common in one population, and second it must spread from that 
population to others. Various random processes may cause the initial shift of one 
population to the group-beneficial equilibrium. In finite populations, sampling 
variation in who is imitated (Gale et ah, 1 995) or in patterns of interaction (Kandori 
et ah, 1993; Ellison, 1993; Young, 1998) can lead to random fluctuations in trait 
frequencies that can tip populations into the basin of attraction of the group- 
beneficial equilibrium. Randomly varying environments can lead to similar shifts 
(Price, Turelli, and Slatkin, 1993) in populations. Finally, individual learning can 
be conceptualized as a process in which individuals use data from the environ- 
ment to infer the best behavior. Learning experiences of individuals within a 
population may often be correlated, because they are utilizing the same data. 
Thus, random variation in such correlated learning experiences could also cause 
equilibrium shifts in large populations. We do not model these processes here. 
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To see how imitation of the successful can lead to the spread of group- 
beneficial strategies, assume that one of these unmodeled processes causes the 
group-beneficial strategy to become common in one group, while the other 
strategy remains common in the rest of the groups. Then, if enough individuals 
in the two neighboring groups imitate behavior 1, these groups will be tipped 
into its basin of attraction, and the group-beneficial trait will increase in those 
two groups. This process is illustrated in figure 12.1. Trait 1 is initially common 
in population i — 1 . In the neighboring population i, trait 2 is common, and thus 
within-group imitation tends to decrease the frequency of trait 1. However, 
individuals in population i are more likely to imitate individuals in population 
i— 1 than in population i + 1 , so extra-group imitation tends to increase the 
frequency of trait 1 in group i. If this latter process is sufficiently strong, it can tip 
population i into trait l’s basin of attraction. If this occurs, the process will be 
repeated in group i+ 1, then group i + 2, and so on, with behavior 1 spreading 
throughout the population in a wave-like fashion. This process is formally 
similar to one recent model of the third phase of Wright’s shifting balance theory 
(Gavrilets, 1995], but is unlike that model in two ways. First, the underlying 
dynamic processes arise from differential imitation, not changes in demography. 
Second, because the multiple equilibria arise from frequency-dependent social 
interaction, not underdominance, the process modeled here leads to the spread 
of the group-beneficial trait for a wide range of parameters (figure 12.2]. 

It is important to see that the spread of the group-beneficial trait depends 
crucially on the assumption that people imitate strategies that lead to success in 
neighboring groups, but will lower their payoff in their own group where dif- 
ferent norms are enforced. In this simple model, a type that restricted imitation 
to its own group would replace the type of imitation assumed here. We think 
our assumption is plausible nonetheless. Empirically, the tendency to imitate the 
successful has been observed in a wide variety of contexts (see Henrich and Gil- 
White, 2001]. This tendency makes sense adaptively. The world is complex and 
hard to understand. It is very difficult in many situations to connect behavior to 
outcomes with much confidence. An individual observes that in the neighboring 
group they never marry cousins and that they are much better off. His neighbors 
say that the gods punish those who marry cousins, and they have had much 
greater success in warfare lately. Of course, the individual knows that it will 
cause trouble to forbid a marriage that both his daughter and his brother want, 
but maybe it will be worth it. The same kinds of uncertainties beset us in the 
modern world despite vastly greater information-gathering capacity. In the early 
1990s it was commonplace to attribute Japan’s economic success to encour- 
agement of long-term investment, their "just in time” inventory practices, or to 
their quality circles, and all of these practices were imitated by American firms 
and policy makers. We have argued at length (Boyd and Richerson, 1985] that 
cultural transmission rules like imitate the successful and imitate the common type 
should be seen as adaptations for dealing with this kind of uncertainty. We have a 
propensity to imitate the successful because it is often very difficult to decide 
what is the best behavior. These learning rules are shortcuts that on average 
allow us to acquire lots of useful information but may, as in the model in this 
chapter, sometimes lead us astray. 
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Figure 12.1 . This graph illustrates the assumed payoff structure and why it can lead to 
the spread of group-beneficial traits. The top panel plots the payoffs to traits 1 and 2 as a 
function of the frequency of trait 1 in their local group. Each trait has a higher relative 
payoff when it is common, but increasing the frequency of trait 1 raises the payoff of all group 
members. As a result, within-group imitation increases the frequency of trait 1 above the 
threshold frequency p and increases the frequency of trait 2 below that threshold. The 
lower panel shows the state of a part of a population in which trait 1 is initially common in 
group i — 1 and trait 2 is common in all other groups. In group i, individuals are more 
likely to imitate people in population i — 1 than in population i + 1 because the former 
have higher payoffs than the latter. Thus, between-group imitation tends to increase the 
frequency of trait 1 in population i. If this effect is strong enough, it can tip group i into the 
basin of attraction of trait 1 and cause the spread of this group-beneficial trait. 

Figure 12.2 plots combinations of the parameters m, s, p, and g that lead to the 
spread of the group-beneficial strategy. It indicates that the group-beneficial 
strategy fails to spread under three circumstances. If there is too much mixing 
between neighboring groups, the beneficial strategy cannot persist in the initial 
population; it is swamped by the flow of behavior 2 from the neighboring groups. 
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Figure 12.2. This graph shows the range of parameters over which the beneficial norm 
spreads to all groups, eliminating the alternative norm, given that the beneficial norm is 
initially common in a single group. The vertical axis gives the ratio of m, the probability that 
individuals interact with others from one of the neighboring groups, to s, the rate of change 
due to imitation within groups. The horizontal axis plots p, the unstable equilibrium that 
separates the basins of attraction of group-beneficial and nongroup-beneficial equilibria 
in isolated groups. The shaded areas give the combinations of m/s and p that lead to the 
spread of the group-beneficial strategy for three values of g. When g= 0, neither norm is 
group-beneficial. Larger values of g mean that the group-beneficial norm leads to a greater 
increase in average payoff. When m is small, the group-beneficial norm cannot spread 
because there is not enough interaction between neighbors for the beneficial effects of 
the norm to cause it to spread. Very large values of m prevent the spread of the group- 
beneficial norm because it cannot persist in the initial population. If the domain of 
attraction of the group-beneficial strategy is too small, the flow of strategies from suc- 
cessful groups to less successful groups does not tip neighboring groups into its basin of 
attraction. Increasing the degree to which strategy 1 is group-beneficial (i.e., the magnitude 
of g) enlarges the range of parameters that lead to the increase in that strategy. Here, 
the number of groups, n, was 32, but results are insensitive to n as long as it is sufficiently 
large. Very small values of n increase the range of parameters under which the group- 
beneficial trait spreads. These results are from simulation — if the group-beneficial trait 
had not spread to all groups after 10,000 time periods, we assumed it would not spread. 
To construct the graph, we chose values of m/s and then used an interval-halving algorithm 
to find the threshold value of p at which trait 1 did not spread. 
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If there is too little mixing, the group-beneficial behavior remains common in the 
initial population but cannot spread because there is not enough interaction be- 
tween neighbors for the beneficial effects of the norm to cause it to spread. If 
the domain of attraction of the group-beneficial strategy is too small, the flow of 
ideas from successful groups to less successful groups may not be sufficient to tip 
neighboring groups into its basin of attraction. Increasing the degree to which 
strategy 1 is group-beneficial [i.e., the magnitude of g) enlarges the range of 
parameters that lead to the increase in that strategy. 

The results plotted in figure 12.3 show that the group-beneficial trait 
spreads at a rate that is roughly comparable with the rate at which individually 



P 

Figure 12.3. This figure plots a measure of the length of time necessary for the spread of 
the group-beneficial trait relative to the length of time necessary for the spread of an 
individually advantageous trait. In the simulations reported, the group-beneficial trait 
spreads from one group to the next at a constant rate after an initial transient period. Here, 
we plot the ratio of the time necessary to increase from a frequency of 0.1 to 0.9 in a 
single group at the boundary of the wave spreading at the constant rate divided by the 
length of time necessary for a purely advantageous trait with dynamics Ap = sp[l —p~) to 
spread from 0. 1 to 0.9 in a single isolated population for two different values of the ratio m/s. 
As in figure 12.1, m is the probability of interacting with, and potentially imitating, an 
individual in each of the two neighboring groups. In both graphs, g= 1 .0, and the parameter/) 
is the unstable equilibrium that divides the basins of attraction of the group-beneficial 
trait and the other trait. These results indicate that spatial structure causes an initially 
individually disadvantageous but group-beneficial trait to spread on roughly the same 
time scale as a simple individually advantageous trait whose within-group dynamics are 
governed by the same rate parameter s. 
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beneficial traits spread within a single group under the influence of the same 
learning process. Thus, if an individually beneficial trait can spread within a 
population in 1 0 years, a group-beneficial trait will spread from one population 
to the next in 1 5-30 years, depending on the amount of mixing and the effect of 
the trait on average fitness. Game theorists have considered a number of mech- 
anisms of equilibrium selection that arise because of random fluctuations in 
outcomes due to sampling variation and finite number of players (Kandori et al., 
1993; Ellison, 1993; Young, 1998; Samuelson, 1997], These processes tend to 
pick out the equilibrium with the largest domain of attraction. However, unless 
spatial structure limits interactions to a small number of individuals, the rate at 
which this occurs in a large population is very slow. Similarly, group selection 
models appear to require unrealistically high group extinction rates to explain 
many examples of the spread of group-beneficial cultural traits (Boyd and 
Richerson, 1990; Soltis et al., 1995]. In contrast, the process we describe here 
leads to the deterministic spread of the group-beneficial trait on roughly the 
same time scale as the same social learning processes cause individually beneficial 
traits to spread within groups. 

Of course, we have not accounted for the processes that influence the rate at 
which the beneficial behavior initially becomes common in a particular group. 
However, if the conditions for spread are satisfied, the group-beneficial trait 
needs to become common only in a single group. If we imagine that group- 
beneficial traits mainly arise as a result of random processes in small populations, 
only the initial group, not the whole population, needs to be small, and the group 
must remain small only for long enough for random processes to give rise to an 
initial “group mutation,” which can then spread relatively rapidly to the pop- 
ulation as a whole. If we imagine that rare events, such as the emergence of 
uniquely charismatic reformers or alignment of the particular constellations of 
political forces, are required to affect a group-favoring innovation, the same 
considerations apply. Only one group need make the original innovation; any 
others with substantial cultural contact can rapidly acquire the trait by the 
mechanism we model here. 


Recombination at the Group Level 

The process described here readily leads to the recombination of group- 
beneficial strategies that initially arise in different groups. The exact combi- 
nation of strategies necessary to support complex, adaptive social institutions 
would seem unlikely to arise through a single chance event. It is much more 
plausible that complex institutions are assembled in numerous small steps. 
Previous group selection models of equilibrium selection are analogous to the 
evolution of an asexual population in that they lack any mechanism that allows 
the recombination of beneficial strategies that arise in different populations 
and thus require innovations to occur sequentially in the same lineage. Within- 
group models in which equilibrium selection occurs through random sampling 
processes assume that the population has reached a stationary distribution, 
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and thus while recombination is possible, there is no cumulative, irreversible 
change. By contrast, the present model allows recombination of different 
strategies and irreversible, cumulative change. To see this, consider a model 
in which strategies consist of two components {pc, y], each with two values 






Figure 12.4. In (a], [b], and (c), the upper graph plots the frequencies of the four possible 
strategies as stacked bar graphs for each of 32 groups: (0, 0) white, [1, 0] light gray, (0, 1] 
dark gray, and (1, 1] black. The lower graph plots the payoff to each strategy net of the 
group effects in each group. The [ — ) line gives the payoff of (0, 0] and the [ • • • ) circles 
give the payoffs of the other three strategies. The parameters are m = 0.02, s = 0. 1, p = 0.4, 
and g= 2. [a) Initially (0, 1) is common in group 8 and [1, 0] is common in group 24, and 
the two group-beneficial traits begin to spread, (b) When the two spreading fronts meet, 
the frequencies of x= 1 and y = 1 are one half, which means that the strategy (1, 1) has the 
highest payoff, (c] Recombination at the individual level introduces strategy [1, 1] into the 
boundary group, and strategy (1, 1] then spreads deterministically, first in that group and 
then to adjacent groups. 
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(0, 1). Let pd and qd be the frequencies of x = 1 and y= 1 in group d, re- 
spectively. Let the payoff of an individual in group d be as follows: 

W d (x,y) = 1 + sx[p d -p) 

4 syUi - p) 

+ gfad + Pd) (4} 

Thus, both x = 1 and y = 1 have an independent group-beneficial effect, and all 
four combinations of x and y can be stable equilibria in isolated groups. Finally, 
suppose that individuals occasionally learn the x component of their strategy 
from one individual and the y component from another, leading to recombination 
of behavioral strategies at the individual level. Once again suppose that the 
population is initially all strategy (0, 0], and that random shocks cause (1, 0) to 
become common in one population and [0, 1] common in a second population. 
Then, if conditions are right, both strategies will begin to spread (figure 12.4[a]}. 
When the two waves meet, the frequency of x = 1 is equal to one half and the 
frequency of y = 1 is equal to one half at the boundary between the two ex- 
panding fronts. The outcome depends on the value of p. If p < the strategy 
(1, 1) has the highest payoff in the group on the boundary, increases deter- 
ministically in that group, and eventually spreads throughout the population as a 
whole (figure 12.4[b]). If p > the strategy (1, 1} has a lower payoff than (1, 0] 
or (0, 1), and the two waves form a stable boundary. However, in the boundary 
group, the most beneficial combination, (1, 1), has a relatively small payoff 
disadvantage compared with (0, 1), and (0, 1} is present at substantial frequency. 
In this situation, a shift to the most beneficial combination due to random shocks 
is much more likely than the shifts that were necessary to cause (0, 1) and (1, 0] 
to become common in the first place. Thus, existing group-beneficial traits will 
recombine more rapidly than new ones arise. 


Conclusion 

Many anthropologists and sociologists have long believed that human behavior is 
regulated by culturally transmitted norms in ways that promote the survival and 
growth of human societies. Economists and other rational choice theorists have 
been skeptical about such functionalist claims because there was no plausible 
mechanism to explain why such norms should be common. Social scientists 
influenced by evolutionary biology tend to share this skepticism based upon 
theoretical models and empirical findings suggesting that group selection is 
generally a weak force in nature. We believe that humans are an exception to this 
rule because cultural variation is much more susceptible to group selection than 
genetic variation. The cultural group selection hypothesis explains both why 
humans cooperate on such a large scale and why the pattern of this cooperation 
is so different from that of other ultrasocial animals (Richerson and Boyd, 1999}. 
Human societies are based upon cooperation between nonrelatives, while kin- 
ship underlies cooperation and complex sociality in other taxa like the social 
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Despite a general fit between the existing models of cultural group selection 
and the facts of human sociality, much uncertainty remains. Earlier work sug- 
gests that the differential survival of culturally distinctive groups can lead to the 
evolution of group-beneficial behavior under plausible circumstances, but that 
this process is quite slow and likely to produce historically contingent group- 
level adaptations (Boyd and Richerson, 1982, 1990; Soltis et al., 1995). Since the 
evolution of human social institutions does have a time scale of millennia and the 
resulting institutions are highly variable, such group selection processes may 
have had a role in shaping these institutions. On the other hand, some social 
institutions do diffuse from one society to another and on time scales shorter 
than a millennium. The spread of the joint stock company on time scales of 
a century is a recent example. Such events accord better with a mechanism like 
the one we model here. 

We suspect that both differential survival and differential diffusion may 
affect the evolution of human social institutions. The operation of many social 
institutions is opaque even to the people who enact them (Nelson and Winter, 
1982, ch. 5), and such institutions are even harder for outsiders to understand. In 
such cases, diffusion may be ineffective because actors cannot connect the attri- 
butes of particular institutions to their success, and this fact may explain why the 
path from the origins of agriculture to our complex modern industrial nations 
took some 10 millennia to traverse. Other institutions spread much more readily 
because their costs and benefits are more readily understood. Proselytizing re- 
ligions, for example, take pains to be transparent to potential converts and thus 
may readily spread. The rate of diffusion of institutions may also be affected by 
how much people know about other societies. It is plausible that the spread of 
literacy and the development of ever better means of transportation have 
gradually increased the importance of the rapid processes based on borrowing 
relative to the slower ones based on group extinction. In the twentieth century, 
social institutions like central banks, soccer, and government bureaucracies have 
become all but universal in about a century. Nevertheless, globalization is in- 
complete; dramatic differences exist even between modern societies (Nisbett, 
Peng, Choi, and Norenzayan, 2001). Some elements of culture likely still have 
time scales of change measured in millennia. 


NOTE 
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13 The Evolution of Altruistic 
Punishment 

With Herbert Gintis and Samuel Bowles 


Unlike any other species, humans cooperate with nonkin in large 
groups. This behavior is puzzling from an evolutionary perspective because 
cooperating individuals incur individual costs to confer benefits on unrelated 
group members. None of the mechanisms commonly used to explain such be- 
havior allows the evolution of altruistic cooperation in large groups. Repeated in- 
teractions may support cooperation in dyadic relations (Axelrod and Hamilton, 
1981; Trivers, 1971; Clutton-Brock and Parker, 1995], but this mechanism is 
unsustainable if the number of individuals interacting strategically is larger than a 
handful (Boyd and Richerson, 1998], Interdemic group selection can lead to the 
evolution of altruism only when groups are small and migration is infrequent 
(Sober and Wilson, 1998; Eshel, 1972; Aoki, 1982; Rogers, 1990]. A third re- 
cently proposed mechanism (Hauert, De Monte, Hofbauer, and Sigmund, 2002] 
requires that asocial, solitary types outcompete individuals living in uncooper- 
ative social groups, an implausible assumption for humans. 

Altruistic punishment provides one solution to this puzzle. In laboratory 
experiments, people punish noncooperators at a cost to themselves even in one- 
shot interactions (Fehr and Gachter, 2002; Ostrom, Gardner, and Walker, 
1994], and ethnographic data suggest that such altruistic punishment helps to 
sustain cooperation in human societies (Boehm, 1993]. It might seem that in- 
voking altruistic punishment simply creates a new evolutionary puzzle: why do 
people incur costs to punish others and provide benefits to nonrelatives? How- 
ever, here we show that group selection can lead to the evolution of altruistic 
punishment in larger groups because the problem of deterring free riders in the 
case of altruistic cooperation is fundamentally different from the problem of 
deterring free riders in the case of altruistic punishment. This asymmetry arises 
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because the payoff disadvantage of altruistic cooperators relative to defectors is 
independent of the frequency of defectors in the population, whereas the cost 
disadvantage for those engaged in altruistic punishment declines as defectors 
become rare because acts of punishment become very infrequent (Sethi and 
Somanathan, 1996}. Thus, when altruistic punishers are common, individual- 
level selection operating against them is weak. 

To see why, consider a model in which a large population is divided into 
groups of size n. There are two behavioral types, contributors and defectors. 
Contributors incur a cost (c] to produce a total benefit (b) that is shared equally 
among group members. Defectors incur no costs and produce no benefits. If the 
fraction of contributors in the group is x, the expected payoff for contributors is 
bx — c and the expected payoff for defectors is bx, so the payoff disadvantage of 
the contributors is a constant c independent of the distribution of types in the 
population. Now add a third type, “punishers,” who cooperate and then punish 
each defector in their group, reducing each defector’s payoff by pin at a cost kin 
to the punisher. If the frequency of punishers is y, the expected payoffs become 
b(x+y) — c to contributors, b(pc+y) — py to defectors, and b[x + y) — c— k(l — 
x — y] to punishers. Contributors have higher fitness than defectors if punishers 
are sufficiently common that the cost of being punished exceeds the cost of co- 
operating (#y>c}. Punishers suffer a fitness disadvantage of k[l —x—y) com- 
pared with nonpunishing contributors. Thus, punishment is altruistic and mere 
contributors are “second-order free riders.” Note, however, that the payoff 
disadvantage of punishers relative to contributors approaches zero as defectors 
become rare because there is no need for punishment. In a more realistic model 
(like the one we show], the costs of monitoring or punishing occasional mistaken 
defections would mean that punishers have slightly lower fitness than contrib- 
utors and that defection is the only one of these three strategies that is an 
evolutionarily stable strategy in a single isolated population. However, the fact 
that punishers experience only a small disadvantage when defectors are rare 
means that weak within-group evolutionary forces, such as mutation (Sethi and 
Somanathan, 1996} or a conformist tendency (Henrich and Boyd, 2001}, can 
stabilize punishment and allow cooperation to persist. But neither produces a 
systematic tendency to evolve toward a cooperative outcome. Here we explore 
the possibility that selection among groups leads to the evolution of altruistic 
punishment when it could not maintain altruistic cooperation. 

Suppose that more cooperative groups are less prone to extinction. Humans 
always live in social groups in which cooperative activities play a crucial role. In 
small-scale societies, such groups frequently become extinct (Soltis, Boyd, and 
Richerson, 1995}. It is plausible that more cooperative groups are less subject to 
extinction because they are more effective in warfare, more successful in coin- 
suring, more adept at managing common resources, or for similar reasons. This 
means that, all other things being equal, group selection will tend to increase the 
frequency of cooperation in the population. Because groups with more punishers 
will tend to exhibit a greater frequency of cooperative behaviors (by both con- 
tributors and punishers}, the frequency of punishers and cooperative behaviors 
will be positively correlated across groups. As a result, punishment will increase 
as a “correlated response” to group selection that favors more cooperative 
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groups. Because selection within groups against punishment is weak when pun- 
ishment is common, this process might support the evolution of substantial levels 
of punishment and maintain punishment once it is common. 

To evaluate this intuitive argument we studied the following model using 
simulation methods. There are N groups. Local density-dependent competition 
maintains each group at a constant population size n. Individuals interact in a 
two-stage “game.” During the first stage, contributors and punishers cooperate 
with probability 1 — e and defect with probability e. Cooperation reduces the 
payoff of cooperators by an amount c and increases the ability of the group to 
compete with other groups. For simplicity, we begin by assuming that cooper- 
ation has no effect on the individual payoffs of others, but does reduce the 
probability of group extinction. Defectors always defect. During the second 
stage, punishers punish each individual who defected during the first stage. After 
the second stage, individuals encounter another individual from their own group 
with probability I — m and an individual from another randomly chosen group 
with probability m. An individual i who encounters an individual j imitates j with 
probability Wj/(Wj+Wj), where W x is the payoff of individual x in the game, 
including the costs of any punishment received or delivered. Thus, imitation has 
two distinct effects: first, it creates a selection-like process that causes higher 
payoff behaviors to spread within groups. Second, it creates a migration-like 
process that causes behaviors to diffuse from one group to another at a rate pro- 
portional to m. Because cooperation has no individual-level benefits, defectors 
spread between groups more rapidly than do contributors or punishers. Group 
selection occurs through intergroup conflict (Bowles, 2001}. In each time period, 
groups are paired at random, and with probability s, intergroup conflict results in 
one group defeating and replacing the other group. The probability that group i 
defeats group j is 1/2(1 | (d ; - d,}}, where d q is the frequency of defectors in 

group q. This means that the group with more defectors is more likely to lose a 
conflict. Note that cooperation is the sole target of the resulting group selection 
process; punishment increases only to the extent that the frequency of punishers 
is correlated with that of cooperation across groups. Finally, with probability n 
individuals of each type spontaneously switch into one of the two other types. 
Mutation and erroneous defection ensure that punishers will incur some pun- 
ishment costs, even when they are common, thus placing them at a disadvantage 
with respect to the contributors. 


Methods 

Two simulation programs implementing the model were independently written, 
one by R. B. in Visual Basic, and a second by H. G. in Delphi. Code is available 
on request. Results from the two programs are highly similar. In all simulations 
there were 128 groups. Initially one group consisted of all altruistic punishers 
and the other 127 groups were all defectors. Various random processes could 
cause such an initial shift. Sampling variation in who is imitated (Gale, Binmore, 
and Samuelson, 1995} could increase the frequency of punishers. Randomly 
varying environments can lead to similar shifts (Price, Turelli, and Slatkin, 1993} 
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in populations. Finally, individual learning can be conceptualized as a process 
in which individuals use data from the environment to infer the best behavior. 
Learning experiences of individuals within a population may often be correlated 
because they are using the same data. Thus, random variation in such correlated 
learning experiences could also cause equilibrium shifts in large populations. We 
do not model these processes here. Simulations were run for 2,000 time periods. 
The long-run average results plotted in figures 13.1-13.4 represent the average 
of frequencies over the last 1,000 time periods of 10 simulations. 

Base case parameters were chosen to represent cultural evolution in small- 
scale societies. We set the time period to be 1 year. Because individually bene- 
ficial cultural traits, such as technical innovations, diffuse through populations 
in 10-100 years [Rogers, 1983], we set the cost of cooperation, c, and punishing, 
k, so that traits with this cost advantage would spread in 50 time periods 
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Figure 13.1. The evolution of cooper- 
ation is strongly affected by the pres- 
ence of punishment, (a) The long-run 
average frequency of cooperation [i.e., 
the sum of the frequencies of con- 
tributors and punishers] as a function 
of group size when there is no pun- 
ishment {p = k = 0] for three different 
conflict rates, 0.075, 0.015, and 0.003. 
Group selection is ineffective unless 
groups are quite small, [b] When there 
is punishment (p = 0.8, k = 0.2], group 
selection can maintain cooperation in 
substantially larger groups. 
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Figure 13.2. The evolution of cooper- 
ation is strongly affected by rate of 
mixing between groups, (a) The long- 
run average frequency of cooperation 
(i.e., the sum of the frequencies of 
contributors and punishers] as a func- 
tion of group size when there is no 
punishment [p = k = 0] for three mix- 
ing rates, 0.002, 0.01, and 0.05. Group 
selection is ineffective unless groups 
are quite small, (b) When there is 
punishment [p = 0.8, k = 0.2], group 
selection can maintain cooperation in 
larger groups for all rates of mixing. 
However, at higher rates of mixing, 
cooperation does not persist in the 
largest groups. 


[c=k = 0.2]. To capture the intuition that in human societies punishment is 
more costly to the punishee than to the punisher, we set the cost of being 
punished to four times the cost of punishing (p = 0.8). We assume that erro- 
neous defection is relatively rare [e = 0.02], The migration rate, m, was set so 
that in the absence of any other evolutionary forces [i.e., c=p=k = e=e = Q), 
passive diffusion will cause two neighboring groups that are initially as different 
as possible to achieve the same trait frequencies in «50 time periods (m = 0.01], 
a value that approximates the migration rates in a number of small-scale societies 
(Harpending and Rogers, 1986]. We set the value of the mutation rate so that 
the long-run average frequency of an ordinary adaptive trait with payoff ad- 
vantage c is «0.9 [p = 0.01]. This means that mutation maintains considerable 
variation, but not so much as to overwhelm adaptive forces. We assume that the 
average group extinction rate is consistent with a recent estimate of cultural 
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Figure 13.3. The evolution of cooper- 
ation is sensitive to the cost of being 
punished (p). Here we plot the long- 
run average frequency of cooperation 
with the base case cost of being pun- 
ished (p = 0.8) and with a lower 
value of p. Lower values of p result 
in much lower levels of cooperation. 



extinction rates in small-scale societies, r»0.0075 [Soltis et al., 1995]. Because 
only one of the two groups entering into a conflict becomes extinct, this implies 
that £ = 0.015. 


Results 

Simulations using this model indicate that group selection can maintain altruistic 
punishment and altruistic cooperation over a wider range of parameter values 
than group selection will sustain altruistic cooperation alone. Figure 13.1 com- 
pares the long-run average levels of cooperation with and without punishment for 
a range of group sizes and extinction rates. If there is no punishment, our simu- 
lations replicate the standard result: group selection can support high frequencies 


Figure 13.4. Punishment does not aid 
in the evolution of cooperation when 
the costs bom by punishers are fixed, 
independent of the number of defectors 
in the group. Here we plot the long-run 
average frequency of cooperation when 
the costs of punishing are proportional 
to the frequency of 
defectors [variable cost], fixed at a 
constant cost equal to the cost of 
cooperating [c], and when there is 
no punishment. 


-■-No Punishment 



THE EVOLUTION OF ALTRUISTIC PUNISHMENT 247 

of cooperative behavior only if groups are quite small. However, adding pun- 
ishment sustains substantial amounts of cooperation in much larger groups. As 
one would expect, increasing the rate of extinction increases the long-run average 
amount of cooperation. 

In this model, group selection leads to the evolution of cooperation only if 
migration is sufficiently limited to sustain substantial between-group differences 
in the frequency of defectors. Figure 13.2 shows that when the migration rate 
increases, levels of cooperation fall precipitously. When punishers are common, 
defectors do badly, but when punishers are rare, defectors do well. Thus, the 
imitation of high payoff individuals creates a selection-like adaptive force that 
acts to maintain variation between groups in the frequency of defectors. How- 
ever, if there is too much migration, this process cannot maintain enough vari- 
ation between groups for group selection to be effective. 

The long-run average amount of cooperation is also sensitive to the cost of 
being punished (figure 13.3]. When the cost of being punished is at base case 
value [p = 4k), even a modest frequency of punishers will cause defectors to 
be selected against, and, as a result, there is a substantial correlation between the 
frequency of cooperation and punishment across groups. When the cost of being 
punished is twice the cost of cooperation (p = 2k), punishment does not suffi- 
ciently reduce the relative payoff of defectors, and the correlation between the 
frequency of cooperators and punishers declines. Lower correlations mean that 
selection among groups cannot compensate for the decline of punishers within 
groups, and eventually both punishers and contributors decline. 

It is important to see that punishment leads to increased cooperation only 
to the extent that the costs associated with being a punisher decline as defectors 
become rare. Monitoring costs, for example, must be paid whether or not there 
are any defectors. When such costs are substantial, or when the probability of 
mistaken defection is high enough that punishers bear significant costs even 
when defectors are rare, group selection does not lead to the evolution of al- 
truistic punishment (figure 13.4]. However, because people live in long-lasting 
social groups and language allows the spread of information about who did what, 
it is plausible that monitoring costs may often be small compared with en- 
forcement costs. This result also leads to an empirical prediction: people should 
be less inclined to pay fixed than variable punishment costs if the mechanism 
outlined here is responsible for the psychology of altruistic punishment. 

Further sensitivity analyses suggest that these results are robust. In addition 
to the results described, we have studied the sensitivity of the model to varia- 
tions in the remaining parameter values. Decreasing the mutation rate sub- 
stantially increases the long-run average levels of cooperation. Random drift-like 
processes have an important effect on trait frequencies in this model. Standard 
models of genetic drift suggest that lower mutation rates will cause groups to 
stay nearer the boundaries of the state space (Crow and Kimura, 1970], and our 
simulations confirm this prediction. Increasing mutation rate, on average, in- 
creases the amount of punishment that must be administered and therefore 
increases the payoff advantage of second-order free riders compared with al- 
truistic punishers. Increasing e, the error rate, reduces the long-run average 
amount of cooperation. Reducing the number of groups, N, adds random noise 
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to the results. We also tested the sensitivity of the model to three structural 
changes. We modified the payoffs so that each cooperative act produces a per 
capita benefit of bln for each other group member and modified the extinction 
model so that the probability of group extinction is proportional to the differ- 
ence between warring groups in average payoffs including the costs of punish- 
ment, rather than simply the difference in frequency of cooperators. The 
dynamics of this model are more complicated because now group selection acts 
against punishers because punishment reduces mean group payoffs. However, 
the correlated effect of group selection on cooperation still tends to increase 
punishment as in the original model. The relative magnitude of these two effects 
depends on the magnitude of the per capita benefit to group members of each 
cooperative act, bln. For reasonable values of b (2c, 4c, and 8c], the results of this 
model are qualitatively similar to those shown. We also investigated a model in 
which cooperation and punishment are characters that vary continuously from 
zero to one. An individual with cooperation value x behaves like a cooperator 
with probability x and like a defector with probability 1 — x. Similarly, an in- 
dividual with a punishment value y behaves like a punisher with probability y 
and like a nonpunisher with probability 1 — y. New mutants are uniformly dis- 
tributed. The steady-state mean levels of cooperation in this model are similar to 
the base model. Finally, we studied a model without extinction analogous to a 
recent model of selection among stable equilibria because of biased imitation 
(Boyd and Richerson, 2002], Populations are arranged in a ring, and individuals 
imitate only individuals drawn from the neighboring two groups. Cooperative 
acts produce a per capita benefit bln so that groups with more cooperators have 
higher average payoff, and thus cooperation will, all other things being equal, 
tend to spread because individuals are prone to imitate successful neighbors. We 
could find no reasonable parameter combination that led to significant long-run 
average levels of cooperation in this last model. 


Discussion 

We have shown that although the logic underlying altruistic cooperation and 
altruistic punishment is similar, their evolutionary dynamics are not. In the 
absence of punishment, within-group adaptation acts to decrease the frequency 
of altruistic cooperation, and as a consequence weak drift-like forces are insuf- 
ficient to maintain substantial variation between groups. In groups in which 
altruistic punishers are common, defectors are excluded, and this maintains 
variation in the amount of cooperation between groups. Moreover, in such 
groups punishers bear few costs, and punishers decrease only very slowly in 
competition with contributors. As a result, group selection is more effective at 
maintaining altruistic punishment than altruistic cooperation. 

These results suggest that group selection can play an important role in 
the cultural evolution of cooperative behavior and moralistic punishment in 
humans. The importance of group selection is always a quantitative issue. There 
is no doubt that selection among groups acts to favor individually costly, group- 
beneficial behaviors. The question is always, is group selection important under 
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plausible conditions? With parameter values chosen to represent cultural evo- 
lution in small-scale societies, cooperation is sustained in groups on the order of 
100 individuals. If the “individuals” in the model represent family groups (on 
grounds that they migrate together and adopt common practices), altruistic pun- 
ishment could be sustained in groups of 600 people, a size much larger than typical 
foraging bands and about the size of many ethno-linguistic units in nonagricul- 
tural societies. Group selection is more effective in this model than in standard 
models for two reasons: first, in groups in which defectors are rare, punishers 
suffer only a small payoff disadvantage compared with contributors, and, as a 
result, variation in the frequency of punishers is eroded slowly. Second, payoff- 
biased imitation maintains variation among groups in the frequency of cooper- 
ation, because in groups in which punishers are common, defectors achieve a low 
payoff and are unlikely to be imitated. 

It would be possible to construct an otherwise similar genetic model in 
which natural selection played the same role that payoff-biased imitation plays in 
the present model, and there is little doubt that for analogous parameter values 
the results for such a genetic model would be very similar to the results pre- 
sented here. However, such a choice of parameters would not be reasonable for a 
genetic model because natural selection is typically much weaker than migration 
for small, neighboring social groups of humans. Our results (figure 13.2) suggest 
that for parameters appropriate for a genetic model, the group selection process 
modeled here will not be effective. It should be noted, however, that the genetic 
evolution of moral emotions might be favored by ordinary natural selection in 
social environments shaped by cultural group selection (Richerson and Boyd, 
1998; Bowles, Choi, and Hopfensitz). 
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Cultural Evolution of Human 
Cooperation 

With Joseph Henrich 


Cooperation 1 is a problem that has long interested evolutionists. In 
both the Origin and Descent of Man, Darwin worried about how his theory might 
handle cases such as the social insects in which individuals sacrificed their 
chances to reproduce by aiding others. Darwin could see that such sacrifices 
would not ordinarily be favored by natural selection. He argued that honeybees 
and humans were similar. Among honeybees, a sterile worker who sacrificed her 
own reproduction for the good of the hive would enjoy a vicarious reproductive 
success through her siblings. Humans, Darwin (1874:178-179] thought, com- 
peted tribe against tribe as well as individually, and their “social and moral 
faculties” evolved under the influence of group competition: 

It must not be forgotten that although a high standard of morality 
gives but slight or no advantage to each individual man and his children 
over other men of the tribe, yet that an increase in the number of well- 
endowed men and an advancement in the standard of morality will cer- 
tainly give an immense advantage to one tribe over another. A tribe 
including many members who, from possessing in a high degree the spirit 
of patriotism, fidelity, obedience, courage, and sympathy, were always 
ready to aid one another, and to sacrifice themselves for the common 
good, would be victorious over most other tribes; and this would be 
natural selection. 

More than a century has passed since Darwin wrote, but the debate among evo- 
lutionary social scientists and biologists is still framed in similar terms — the con- 
flict between individual and prosocial behavior guided by selection on individuals 
versus selection on groups. In the meantime social scientists have developed 
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various theories of human social behavior and cooperation — rational choice 
theory takes an individualistic approach while functionalism analyzes the group- 
advantageous aspects of institutions and behavior. However, unlike more tradi- 
tional approaches in the social sciences, evolutionary theories seek to explain both 
contemporary behavioral patterns and the origins of the impulses, institutions, 
and preferences that drive behavior. 

In this chapter we refer to “culture” as the information stored in individual 
brains (or in books and analogous media) that was acquired by imitation of, 
or teaching by, others. Because culture can be transmitted forward through time 
from one person to another and because individuals vary in what they learn from 
others, culture has many of the same properties as the genetic system of in- 
heritance but also, of course, many differences. The formal import of the anal- 
ogies and disanalogies has been worked out in some analytical detail (e.g., 
Cavalli-Sforza and Feldman, 1981; Boyd and Richerson, 1985). We also sub- 
scribe to Price’s approach to the concept of group selection. Heritable variation 
between entities can appear at any level of organization, and any level above the 
individual merits the term group selection (Henri ch, 2004a; Hamilton, 1975; 
Price, 1972; Sober and Wilson, 1998). Here we focus on the more conventional 
notion that selection on variation between fairly large social units counts as 
group selection. In fact, we have in mind, like Darwin and Hamilton, selection 
among tribes of at least a few hundred people, so we are referring to the cultural 
analog of what is sometimes called interdemic group selection. 


Theories of Cooperation 

We draw evidence about cooperation from many sources. Ethnographic and 
historical sources include diverse religious doctrines, norms, and customs, as well 
as folk psychology. Anthropologists and historians document an immense di- 
versity of human social organizations, and most of these are accompanied by 
moral justifications, if often contested ones. Johnson and Earle (2000) provide a 
good introduction to the vast body of data collected by sociocultural anthro- 
pologists. Some important empirical topics are the focus of sophisticated work. 
For example, the cross-cultural study of commons management is already a well- 
advanced field (Baland and Platteau, 1996), drawing upon the disciplines of 
anthropology, political science, and economics. 

Human Cooperation Is Extensive and Diverse 

Human patterns of cooperation are characterized by a number of features: 

• Humans are prone to cooperate, even with strangers. Many people co- 
operate in anonymous one-shot prisoner’s dilemma games (Marwell 
and Ames, 1981) and often vote altruistically (Sears and Funk, 
1990). People begin contributing substantially to public goods sec- 
tors in economic experiments (Ostrom, 1998; Falk, Fehr, and 
Fischbacher, 2002). Experimental results accord with common 
experience. Most of us have traveled in foreign cities, even poor 
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foreign cities filled with strange people for whom our possessions and 
spending money are worth a small fortune, and found risk of robbery 
and commercial chicanery to be small. These observations apply 
across a wide spectrum of societies, from small-scale foragers to 
modern cities in nation states (Henrich, 2004a}. 

Cooperation is contingent on many things. Not everyone cooperates. 
Aid to distressed victims increases substantially if a potential altru- 
ist’s empathy is engaged (Batson, 1991}. Being able to discuss a game 
beforehand and to make promises to cooperate affects success 
(Dawes, van de Kragt, and Orbell, 1990}. The size of the resource, 
technology for exclusion and exploitation of the resource, and 
similar gritty details affect whether cooperation in commons man- 
agement arises (Ostrom, 1990:202-204}. Scientific findings corre- 
spond well to personal experience. Sometimes people cooperate 
enthusiastically, sometimes reluctantly, and sometimes not at all. 
People vary considerably in their willingness to cooperate even un- 
der the same environmental conditions. 

Institutions matter. People from different societies behave differently 
because their beliefs, skills, mental models, values, preferences, and 
habits have been inculcated by long participation in societies with 
different institutions. In repeated play common property experi- 
ments, initial defections induce further defections until the contribu- 
tion to the public good sector approaches zero. However, if players 
are allowed to exercise strategies they might use in the real world 
(e.g., to punish those who defect}, participation in the commons 
stabilizes a substantial degree of cooperation (Fehr and Gachter, 
2002}, even in one-shot (nonrepeated} contexts. Strategies for suc- 
cessfully managing commons are generally institutionalized in sets of 
rules that have legitimacy in the eyes of the participants (Ostrom, 
1990, ch. 2}. Families, local communities, employers, nations, and 
governments all tap our loyalties with rewards and punishments and 
greatly influence our behavior. 

Institutions are the product of cultural evolution. 2 Richard Nisbett’s 
group has shown how people’s affective and cognitive styles become 
intimately entwined with their social institutions (Cohen and Van- 
dello, 2001; Nisbett and Cohen, 1996; Nisbett, Peng, Choi, and 
Norenzayan, 2001}. Because such complex traditions are so deeply 
ingrained, they are slow both to emerge and decay. Many commons 
management institutions have considerable time depths (Ostrom, 
1990, ch. 3}. Throughout most of human history, institutional 
change was so slow as to be almost imperceptible by individuals. 
Today, change is rapid enough to be perceptible. The slow rate of 
change of institutions means that different populations experiencing 
the same environment and using the same technology often have 
quite different institutions (Kelly, 1985; Salamon, 1992}. 

Variation in institutions is huge. Already with its very short list of 
societies and games, the experimental ethnography approach has 
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uncovered striking differences (Henrich et ah, 2001; Nisbett et al. ; 
2001). Plausibly, design complexity, coordination equilibria, and other 
phenomena generate multiple evolutionary equilibria and much his- 
torical contingency in the evolution of particular institutions [Boyd 
and Richerson, 1992a); consider how different communities, univer- 
sities, and countries solve the same problems differently. 

Evolutionary Models Can Explain the Nature of 
Preferences and Institutions 

These facts constrain the theories we can entertain regarding the causes of human 
cooperation. For example, high levels of cooperation are difficult to reconcile 
with the rational choice theorist’s usual assumption of self-regarding preferences, 
and the diversity of institutional solutions to the same environmental problems 
challenges any theory in which institutions arise directly from universal human 
nature. The “second-generation” bounded rational choice theory, championed 
by Ostrom (1998), has begun to address these challenges from within the rational 
choice framework. These approaches add a psychological basis and institutional 
constraints to the standard rational choice theory. Experimental studies verify 
that people do indeed behave quite differently from rational selfish expectations 
(Fehr and Gachter, 2002; Batson, 1991). Although psychological and social 
structures are invoked to explain individual behavior and its variation, an expla- 
nation for the origins and variation in psychology and social structure is not part of 
the theory of bounded rationality. 

Evolutionary theory permits us to address the origin of preferences. A num- 
ber of economists have noted the neat fit between evolutionary theory and 
economic theory [Hirshleifer, 1977; Becker, 1976). Evolution explains what 
organisms want, and economics explains how they should go about getting what 
they want. Without evolution, preferences are exogenous, to be estimated em- 
pirically but not explained. The trouble with orthodox evolutionary theory is that 
its predictions are similar to predictions from selfish rationality, as we will see. At 
the same time, unvarnished evolutionary theory does do a good job of explaining 
most other examples of animal cooperation. To do a satisfactory job of explaining 
why humans have the unusual forms of social behavior depicted in our list of 
stylized facts, we need to appeal to the special properties of cultural evolution and 
more broadly to theories of culture-gene coevolution [Henrich and Boyd, 2001; 
Richerson and Boyd, 1998, 1999; Henrich, 2004a). 

Such evolutionary models have both intellectual and practical payoffs. The 
intellectual payoff is that evolutionary models link answers to contemporary 
puzzles to crucial long timescale processes. The most important economic 
phenomenon of the past 500 years is the rise of capitalist economies and their 
tremendous impact on every aspect of human life. Expanding the timescale a bit, 
the most important phenomena of the last 10 millennia are the evolution of ever- 
more complex social systems and ever-more sophisticated technology following 
the origins of agriculture [Richerson, Boyd, and Bettinger, 2001). A satisfac- 
tory explanation of both current behavior and its variation must be linked to 
such long-run processes, where the times to reach evolutionary equilibria are 
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measured in millennia or even longer spans of time. More practically, dynamism 
of the contemporary world creates major stresses on institutions that manage 
cooperation. Evolutionary theory will often be useful because it will lead to an 
understanding of how to accelerate institutional evolution to better track rapid 
technological and economic change. Nesse and Williams [1995) provide an 
analogy in the context of medical practice. 


Evolutionary Models Account for the Processes That Shape 
Heritable Cenetic and Cultural Variation through Time 

Evolutionary explanations are recursive. Individual behavior results from an inter- 
action of inherited attributes and environmental contingencies. In most species, 
genes are the main inherited attributes; however, inherited cultural information is 
also important for humans. Individuals with different inherited attributes may 
develop different behaviors in the same environment. Every generation, evolu- 
tionary processes — natural selection is the prototype — impose environmental 
effects on individuals as they live their lives. Cumulated over the whole popu- 
lation, these effects change the pool of inherited information, so that the in- 
herited attributes of individuals in the next generation differ, usually subtly, from 
the attributes in the previous generation. Over evolutionary time, a lineage cycles 
through the recursive pattern of causal processes once per generation, more or 
less gradually shaping the gene pool and thus the succession of individuals that 
draw samples of genes from it. Statistics that describe the pool of inherited at- 
tributes (e.g., gene frequencies) are basic state variables of evolutionary analysis. 
They are what change over time. 

Note that in a recursive model, we explain individual behavior and population- 
level processes in the same model. Individual behavior depends, in any given 
generation, on the gene pool from which inherited attributes are sampled. The 
pool of inherited attributes depends in turn upon what happens to a population 
of individuals as they express those attributes. Evolutionary biologists have a long 
list of processes that change the gene frequencies, including natural selection, 
mutation, and genetic drift. However, no organism experiences natural selection. 
Organisms either live or die, reproduce or fail to reproduce, for concrete rea- 
sons particular to the local environment and the organism's own particular at- 
tributes. If, in a particular environment, some types of individuals do better than 
others, and if this variation has a heritable basis, then we label as “natural se- 
lection” the resulting changes in gene frequencies of populations. We use abstract 
categories like selection to describe such concrete events because we wish to build 
up some useful generalizations about evolutionary process. Few would argue that 
evolutionary biology is the poorer for investing effort in this generalizing project. 

Although some of the processes that lead to cultural change are very dif- 
ferent from those that lead to genetic change, the logic of the two evolutionary 
problems is very similar. For example, the cultural generation time is short in the 
case of ideas that spread rapidly, but modeling the evolution of such cultural 
phenomena [e.g., semiconductor technology) presents no special problems [Boyd 
and Richerson, 1985:68-69). Similarly, human choices include ones that modify 
inherited attributes directly, rather than indirectly, by natural selection. These 
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“Lamarckian” effects are easily added to models, and the models remain evo- 
lutionary so long as rationality remains bounded [Young, 1998}. Such models 
easily handle continuous (nondiscrete] traits, low-fidelity transmission, and any 
number of “inferential transformations” that might occur during transmission 
(Henrich and Boyd, 2002; Cavalli-Sforza and Feldman, 1981; Boyd and Ri- 
cherson, 1985]. The degenerate case of omniscient rationality, of course, needs 
no recursion because everything happens in the first generation (instantly in a 
typical rational choice model}. The study of how genetically and culturally in- 
herited elements impose bounds on choice is a natural extension of the concept 
of bounded rationality (Boyd and Richerson, 1993]. 


Evolution Is Multilevel 

Evolutionary theory is always multilevel; at a minimum, it keeps track of prop- 
erties of individuals, like their genotypes, and of the population, such as the 
frequency of a particular gene. Other levels also may be important. Individual’s 
phenotypes are derived from many genes interacting with each other and the 
environment. Populations may be structured (e.g., divided into social groups 
with limited exchanges of members}. Thus, evolutionary theories are systemic, 
integrating every part of biology. In principle, everything that goes into causing 
change through time plays its proper part in the theory. 

This in-principle completeness led Ernst Mayr (1982] to speak of “proxi- 
mate” and “ultimate” causes in biology. Proximate causes are those that phys- 
iologists and biochemists generally treat by asking how an organism functions. 
These are the causes produced by individuals with attributes interacting with 
environments and producing effects upon them. Do humans use innate coop- 
erative propensities to solve commons problems, or do they have only self- 
interested innate motives? Or are the causes more complex than either proposal? 
Ultimate causes are evolutionary. The ultimate cause of an organism’s behavior 
is the history of evolution that shaped the gene pool from which our samples of 
innate attributes are drawn. Evolutionary analyses answer why questions. Why 
do human communities typically solve at least some of the commons dilemmas 
and other cooperation problems on a scale unknown in other apes and monkeys? 
Human-reared chimpanzees are capable of many human behaviors, but they nev- 
ertheless retain many chimpanzee behaviors and cannot act as full members of 
a human community (Savage-Rumbaugh and Lewin, 1994; Gardner, Gardner, 
and Van Cantfort, 1989}. Thus, we know that humans have different innate 
influences on their behavior than chimpanzees do, and these must have arisen in 
the course of the two species’ divergence from our common ancestor. 

In Darwinian evolutionary theories, the ultimate sources of cooperative 
behavior are classically categorized into three evolutionary processes operating at 
different levels of organization (for a framework unifying these classical divi- 
sions, see Henrich, 2004a}: 

• Individual-level selection. Individuals and the variants they carry are 
obviously a locus of selection. Selection at this level favors selfish 
individuals who are evolved to maximize their own survival and 
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reproductive success. Pairs of self-interested actors can cooperate 
when they interact repeatedly (Axelrod and Hamilton, 1981; Trivers, 
1971], Alexander (1987} argued that such reciprocal cooperation can 
also explain complex human social systems, but most formal modeling 
studies make this proposal doubtful (Leimar and Hammerstein, 2001; 
Boyd and Richerson, 1989}. Still, some version of Alexander’s indirect 
reciprocity is perhaps the most plausible alternative to the cultural 
group selection hypothesis that we champion here. Most such pro- 
posals beg the question of how humans and not other animals can take 
massive advantage of indirect reciprocity (e.g., Nowak and Sigmund, 
1998}. Smith (2003} proposes to make language the key. 3 

• Kin selection. Hamilton’s (1964} articles showing that kin should 
cooperate to the extent that they share genes identical by common 
descent are one of the theoretical foundations of sociobiology. Kin 
selection can lead to cooperative social systems of a remarkable scale, 
as illustrated by the colonies of termites, ants, and some bees and 
wasps. However, most animal societies are small because individuals 
have few close relatives. It is the fecundity of insects, and in one case 
rodents, that permits a single queen to produce huge numbers of 
sterile workers and hence large, complex societies composed of close 
relatives (Campbell, 1983}. 

• Group selection. Selection can act on any pattern of heritable variation 
that exists (Price, 1972}. Darwin’s model of the evolution of coop- 
eration by intertribal competition is perfectly plausible, as far as it 
goes. The problem is that genetic variation between groups other 
than kin groups is hard to maintain unless the migration between 
groups is very small or unless some very powerful force generates 
between-group variation (e.g., Aoki, 1982; Slatkin and Wade, 1978; 
Wilson, 1983}. In the case of altruistic traits, selection will tend to 
favor selfish individuals in all groups, tending to aid migration in re- 
ducing variation between groups. Success of kin selection in ac- 
counting for the most conspicuous and highly organized animal 
societies (except humans} has convinced many, but not all, evolu- 
tionary biologists that group selection is of modest importance in 
nature (for a group selectionist’s view of the controversy, see Sober 
and Wilson, 1998}. It is also important to note that the problem of 
maintenance of between-group variation applies only to altruistic/ 
cooperative traits, not to social behavior in general. Nearly all evo- 
lutionary biologists would agree that group selection is likely to be 
important for any social interaction with multiple stable equilibria, 
such as those coordination situations mentioned by Smith (2003}. 

We could make this picture much more complex by adding higher and lower 
levels of structure. Many examples from human societies will occur to the 
reader, such as gender. Indeed, Rice (1996} has elegantly demonstrated that 
selection on genes expressed in the different sexes sets up a profound conflict of 
interest between these genes. If female Drosophila are prevented from evolving 
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defenses, male genes will evolve that seriously degrade female fitness. The ge- 
nome is full of such conflicts, usually muted by the fact that an individual's genes 
are forced by the evolved biology of complex organisms to all have an equal shot 
at being represented in one’s offspring. Our own bodies are a group-selected 
community of genes organized by elaborate “institutions” to ensure fairness in 
genetic transmission, such as the lottery of meiosis that gives each chromosome 
of a pair a fair chance at entering the functional gamete (Maynard Smith and 
Szathmary, 1995]. 


Culture Evolves 

In theorizing about human evolution, we must include processes affecting culture 
in our list of evolutionary processes alongside those that affect genes. Culture is a 
system of inheritance. We acquire behavior by imitating other individuals much 
as we get our genes from our parents. A fancy capacity for high-fidelity imitation 
is one of the most important derived characters distinguishing us from our pri- 
mate relatives (Tomasello, 1999). We are also an unusually docile animal (Si- 
mon, 1990) and unusually sensitive to expressions of approval and disapproval 
by parents and others (Baum, 1994). Thus, parents, teachers, and peers can 
rapidly, easily, and accurately shape our behavior compared to training other 
animals using more expensive material rewards and punishments. Finally, once 
children acquire language, parents and others can communicate new ideas quite 
economically. Our own contribution to the study of human behavior is a series 
of mathematical models of what we take to be the fundamental processes of 
cultural evolution (e.g., Boyd and Richerson, 1985). Application of Darwinian 
methods to the study of cultural evolution was forcefully advocated by Campbell 
(1965, 1975). Cavalli-Sforza and Feldman (1981) constructed the first mathe- 
matical models to analyze cultural recursions. The list of processes that shape 
cultural change includes: 

• Biases. Humans do not passively imitate whatever they observe. 
Rather, cultural transmission is biased by decision rules that in- 
dividuals apply to the variants they observe or try out. The rules 
behind such selective imitation may be innate or the result of earlier 
imitation or a mixture of both. Many types of rules might be used to 
bias imitation. Individuals may try out a behavior and let reinforce- 
ment guide acceptance or rejection, or they may use various rules of 
thumb to reduce the need for costly trials and punishing errors. Rules 
like “copy the successful,” “copy the prestigious” (Henrich and Gil- 
White, 2001; Boyd and Richerson, 1985), or “copy the majority” 
(Boyd and Richerson, 1985; Henrich and Boyd, 1998) allow in- 
dividuals to acquire rapidly and efficiently adaptive behavior across a 
wide range of circumstances and play an important role in our hy- 
pothesis about the origins of cooperative tendencies in human be- 
havior (Henrich and Boyd, 2001). 

• Nonrandom variation. Genetic innovations (mutations, recombina- 
tions) are random with respect to what is adaptive. Human individual 
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innovation is guided by many of the same rules that are applied to 
biasing ready-made cultural alternatives. Bias and learning rules have 
the effect of increasing the rate of evolution relative to what can be 
accomplished by random mutation, recombination, and natural se- 
lection. We believe that culture originated in the human lineage as an 
adaptation to the Plio-Pleistocene ice-age climate deterioration, 
which includes much rapid, high-amplitude variation of just the sort 
that would favor adaptation by nonrandom innovation and biased 
imitation (Richerson and Boyd, 2000a, b], 

• Natural selection. Since selection operates on any form of heritable 
variation and imitation and teaching are forms of inheritance, natural 
selection will influence cultural as well as genetic evolution. However, 
selection on culture is liable to favor different behaviors than selection 
on genes. Because we often imitate peers, culture is liable to selection 
at the subindividual level, potentially favoring pathogenic cultural 
variants — selfish memes (Blackmore, 1999). On the other hand, rules 
like conformist imitation have the opposite effect. By tending to sup- 
press cultural variation within groups, such rules protect variation 
between them, potentially exposing our cultural variation to much 
stronger group selection effects than our genetic variation (Soltis, 
Boyd, and Richerson, 1995; Henrich and Boyd, 1998). Human pat- 
terns of cooperation may owe much to cultural group selection. 

Evolutionary Models Are Consistent with a Wide Variety of Theories 

Evolutionary theory prescribes a method, not an answer, and a wide range of 
particular hypotheses can be cast in an evolutionary framework. If population- 
level processes are important, we can set up a system for keeping track of her- 
itable variation and the processes that change it through time. Darwinism as a 
method is not at all committed to any particular picture of how evolution works 
or what it produces. Any sentence that starts with “evolutionary theory pre- 
dicts” should be regarded with caution. 

Evolutionary social science is a diverse field (Borgerhoff Mulder, Richerson, 
Thornhill, and Voland, 1997; Laland and Brown, 2002). Our own work, which 
emphasizes an ultimate role for culture and for group selection on cultural var- 
iation, is controversial. Many evolutionary social scientists assume that culture is 
a strictly proximate phenomenon, akin to individual learning (e.g., Alexander, 
1979), or is so strongly constrained by evolved psychology as to be virtually 
proximate (Wilson, 1998). As Alexander (1979:80) puts it, “Cultural novelties 
do not replicate or spread themselves, even indirectly. They are replicated as a 
consequence of the behavior of vehicles of gene replication.” We think both 
theory and evidence suggest that this perspective is dead wrong. Theoretical 
models show that the processes of cultural evolution can behave differently in 
critical respects from those only including genes, and much evidence is consis- 
tent with these models. 

Most evolutionary biologists believe that individually costly group-beneficial 
behavior can arise only as a side effect of individual fitness maximization. We 
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have noted the problems with maintaining variation between groups in theory 
and the seeming success of alternative explanations. Many, but by no means all, 
students of evolution and human behavior have followed the argument against 
group selection forcefully articulated by Williams (1966). 

However, cultural variation is more plausibly susceptible to group selection 
than is genetic variation. For example, if people use a somewhat conformist bias in 
acquiring important social behaviors, variation between groups needed for group 
selection to operate is protected from the variance-reducing force of migration 
between groups (Boyd and Richerson, 1985, 2002; Henrich and Boyd, 2001). 


Evolution of Cooperative Institutions 

Here we summarize our theory of institutional evolution, developed elsewhere 
in more detail (Richerson and Boyd, 1998, 1999), which is rooted in a mathe- 
matical analysis of the processes of cultural evolution and is consistent with 
much empirical data. We make limited claims for this particular hypothesis, 
although we think that the thrust of the empirical data as summarized by the 
stylized facts are much harder on current alternatives. We make a much stronger 
claim that a dual gene-culture theory of some kind will be necessary to account 
for the evolution of human cooperative institutions. 

Understanding the evolution of contemporary human cooperation requires 
attention to two different timescales: first, a long period of evolution in the 
Pleistocene epoch shaped the innate “social instincts” that underpin modern 
human behavior. During this period, much genetic change occurred as a result 
of humans living in groups with social institutions heavily influenced by culture, 
including cultural group selection (Richerson and Boyd, 2001). On this time- 
scale, genes and culture coevolve, and cultural evolution is plausibly a leading 
rather than lagging partner in this process. We sometimes refer to the process 
as “culture-gene coevolution.” Then, only about 10,0000 years ago, the origins 
of agricultural subsistence systems laid the economic basis for revolutionary 
changes in the scale of social systems. Evidence suggests that genetic changes 
in the social instincts over the last 10,000 years are insignificant. Evolution of 
complex societies, however, has involved the relatively slow cultural accumu- 
lation of institutional “work-arounds” that take advantage of a psychology 
evolved to cooperate with distantly related and unrelated individuals belonging 
to the same symbolically marked “tribe” while coping more or less successfully 
with the fact that these social systems are larger, more anonymous, and more 
hierarchical than the late Pleistocene tribal-scale systems. 5 

Tribal Social Instincts Hypothesis 

Our hypothesis is premised on the idea that selection between groups plays a 
much more important role in shaping culturally transmitted variation than it 
does in shaping genetic variation. As a result, humans have lived in social en- 
vironments characterized by high levels of cooperation for as long as culture has 
played an important role in human development. To judge from the other living 
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apes, our remote ancestors had only rudimentary culture (Tomasello, 1999] and 
lacked cooperation on a scale larger than groups of close kin (Boehm, 1999]. The 
difficulty of constructing theoretical models of group selection on genes favoring 
cooperation matches neatly with the empirical evidence that cooperation in 
most social animals is limited to kin groups. In contrast, rapid cultural adaptation 
can lead to ample variation among groups whenever multiple stable social equi- 
libria arise. At least two cultural processes can maintain multiple stable equi- 
libria: (1] conformist social learning and (2] moralistic enforcement of norms. 
Such models of group selection are relatively powerful because they require only 
the social, not physical, extinction of groups. Formal theoretical models suggest 
that conformism is an adaptive heuristic for biasing imitation under a wide 
variety of conditions (Boyd and Richerson, 1985, ch. 7; Henrich and Boyd, 1998; 
Simon, 1990], and both field and laboratory work provide empirical support 
(Henrich, 2001]. Models of moralistic punishment (Boyd and Richerson, 1992b; 
Boyd, Gintis, Bowles, and Richerson, 2003; Henrich and Boyd, 2001] lead to 
multiple stable social equilibria and to reductions in noncooperative strategies if 
punishment is prosocial. As a consequence, we believe, a growing reliance on 
cultural evolution led to larger, more cooperative societies among humans over 
the last 250,000 years or so. 

Ethnographic evidence suggests that small-scale human societies are subject 
to group selection of the sort needed to favor cooperation at a tribal scale. Soltis 
et al. (1995] analyzed ethnographic data on the results of violent conflicts among 
Highland New Guinea clans. These conflicts fairly frequently resulted in the 
social extinction of clans. Many of the details of this process are consistent with 
cultural group selection. For example, social extinction does not mean physical 
elimination of the entire group. Quite the contrary, most people survive defeat 
but flee as refugees to other groups, into which they are incorporated. This sort 
of extinction cannot support genetic group selection because so many of the 
defeated survive and because they would tend to carry their unsuccessful genes 
into successful groups, rapidly running down variation between groups. How- 
ever, the effects of conformist cultural transmission combined with moralistic 
punishment makes between-group cultural variation much less subject to ero- 
sion by migration and within-group success of uncooperative strategies than is 
true in the case of acultural organisms. 

The New Guinea cases had little information regarding the cultural variants 
that might have been favored by cultural group selection. Other examples are 
more informative in this regard. Kelly (1985] has worked out in detail the way 
bridewealth customs in the Nuer and Dinka, cattle-keeping people of the 
Southern Sudan, led to the Nuer maintaining larger tribal systems. These larger 
tribes, in turn, allowed the Nuer to field larger forces than Dinka in disputes 
between the two groups. As a result, the Nuer expanded rapidly at the expense 
of the Dinka in the nineteenth and early twentieth centuries. Here, as in New 
Guinea, many Dinka lineages survived these fights and were often assimilated 
into Nuer tribes, a process, again, highly hostile to group selection on genes. The 
larger ethnographic corpus suggests that the sort of intergroup conflict described 
by Soltis and Kelly is very common, if not ubiquitous (Keeley, 1996; Otterbein, 
1970]. Darwin’s picture of a group selection process operating at the level of 
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competing, symbolically marked tribal units with the outcome determined 
by differences in “patriotism, fidelity, obedience, courage, sympathy” (Darwin, 
1874, ch. 5] and the like can work, but only upon cultural — not genetic — 
variation for such traits. 

Consistent with this argument, evidence suggests that people in late Pleis- 
tocene human societies cooperated on a tribal scale (Bettinger, 1991:203-205; 
Richerson and Boyd, 1998], “Tribe” is sometimes used in a technical sense to 
include only societies with fairly elaborate institutions for organizing cooperation 
among distantly related and unrelated people. We apply the term to any insti- 
tution that organizes interfamilial cooperation, even if it is rather simple and the 
amount of cooperation organized modest. Definitional issues aside, our claim is 
controversial because the archaeological record permits only weak inferences 
about social organization and because the spectrum of social organization in 
ethnographically known hunter-gatherers is very broad (Kelly, 1995], At the 
simple end of the spectrum are “family-level” societies (Johnson and Earle, 
2000; Steward, 1955], such as the Shoshone of the Great Basin and IKung of the 
Kalahari. Because these two groups are so simply organized, some scholars used 
them as an archetypal model for Paleolithic societies (Kelly, 1995:2]. However, 
such groups are likely poor examples of the “average” Paleolithic society be- 
cause they inhabit and have adapted to marginal environments using subsistence 
strategies quite different from any known from the Paleolithic (R. Bettinger, 
personal communication]. Also, we believe that the ethnographic societies used 
to exemplify the family level of organization actually have tribal institutions of 
some sophistication. 

Much evidence suggests that typical Paleolithic societies were more com- 
plex than the Shoshone or the IKung. Many late Pleistocene societies empha- 
sized big game hunting, often in resource-rich environments, rather than the 
plant foods emphasized in the marginal environments inhabited by Kalahari 
foragers and the Shoshone. For example, the Kalahari foragers (along with the 
Aranda in the Australian desert] anchor the low end of the distribution with 
respect to plant biomass found in regions of 23 ethnographically known nomadic 
foraging groups (Kelly, 1995:122]. As Steward (1955] reports, big game hunting 
in ethnographic cases typically involves cooperation on a larger scale than plant 
collecting and small game hunting; thus, we should expect societies in the late 
Pleistocene to be more, not less, socially complex than the IKung and Shoshone. 
In any case we think it an error to try to identify an archetypal Pleistocene so- 
ciety; most likely last glacial societies spanned as large or larger a spectrum of 
social organization as ethnographically known cases. Art and settlement size 
(several hundred people] at Upper Paleolithic sites in France and Spain suggest 
that these societies were toward the complex end of the foraging spectrum (Price 
and Brown, 1985]. In Central Europe, the palisades and large housing structures 
look much more like those of the Northwest Coast Indians or big-men social 
forms of New Guinea than those of the IKung or Shoshone (Johnson and Earle, 
2000 ], 

Moreover, despite the marginality of their environment, the archetypal 
family-level societies do have tribal-scale institutions for dealing with environ- 
mental uncertainty (Wiessner, 1984]. For example, the Shoshonean peoples of 
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the North American Great Basin foraged for most of the year in nuclear family 
units. Resources in the basin were not only sparse but widely scattered, mili- 
tating against aggregation into larger units during much of the year. Although 
such bands were generally politically autonomous, they were at least tenuously 
linked into larger units. In regard to the Shoshoneans, Steward (1955:109] re- 
marks that the “nuclear families have always co-operated with other families in 
various ways. Since this is so, the Shoshoneans, like other fragmented family 
groups, represent the family level of sociocultural integration only in a relative 
sense.” Winter encampments of 20 or 30 families were the largest aggregations 
among Shoshoneans; however, these were not formal organizations but rather 
aggregations of convenience. Aside from visiting, some cooperative ventures, 
such as dances (fandangos), rabbit drives, and occasional antelope drives, were 
organized during winter encampments. The number of families that a given 
family might camp with over a period of years was also not fixed, although peo- 
ple preferred to camp with people speaking the same dialect (R. Bettinger, 
personal communication). Steward’s picture of the simplicity of the Shoshone 
has been challenged. Thomas, Pendleton, and Cappannari (1986:278) observe 
that, at best, Steward’s characterization applied only to limiting cases, as, indeed, 
his frank use of them to imperfectly exemplify an ideal type suggests. Murphy 
and Murphy (1986), citing the case of the Northern Shoshone and Bannock, 
argue that the unstructured fluidity of Shoshonean society conceals a sophisti- 
cated adaptation to the sparse and uncertain resources of the Great Basin. The 
Shoshoneans maintained peace among themselves over a very large region, en- 
abling families and small groups of families to move over vast distances in re- 
sponse to local feast and famine. When local resources permitted and necessity 
required, they were able to assemble considerable numbers of people for col- 
lective purposes. Murphy and Murphy cite the formation of war parties num- 
bering in the hundreds to contest bison hunting areas with the Blackfeet. Indeed, 
the Shoshone and their relatives were relatively recent immigrants to the Great 
Basin who pushed out societies that were probably socially more complex 
but less well adapted to the sparse Great Basin environment (Bettinger and 
Baumhoff, 1982). Murphy and Murphy summarize by saying “the Shoshone are 
a ‘people’ in the truest sense of the word” (p. 92). Compared to our great ape 
relatives, and presumably our remoter ancestors, Shoshonean families main- 
tained generally friendly relations with a rather large group of other families, 
could readily strike up cooperative relations with strangers of their ethnic group, 
and organized cooperative activities on a considerable scale. 

We believe that the human capacity to live in larger-scale forms of tribal 
social organization evolved through a coevolutionary ratchet generated by the 
interaction of genes and culture. Rudimentary cooperative institutions favored 
genotypes that were better able to live in more cooperative groups. Those in- 
dividuals best able to avoid punishment and acquire the locally relevant norms 
were more likely to survive. At first, such populations would have been only 
slightly more cooperative than typical nonhuman primates. However, genetic 
changes, leading to moral emotions, like shame and a capacity to learn and in- 
ternalize local practices, would allow the cultural evolution of more sophisticated 
institutions that in turn enlarged the scale of cooperation. These successive 
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rounds of coevolutionary change continued until eventually people were equip- 
ped with capacities for cooperation with distantly related people, emotional at- 
tachments to symbolically marked groups, and a willingness to punish others for 
transgression of group rules. Mechanisms by which cultural institutions might 
exert forces tugging in this direction are not far to seek. People are likely to 
discriminate against genotypes that are incapable of conforming to cultural norms 
(Richerson and Boyd, 1989; Laland, Kumm, and Feldman, 1995). People who 
cannot control their self-serving aggression ended up exiled or executed in small- 
scale societies and imprisoned in contemporary ones. People whose social skills 
embarrass their families will have a hard time attracting mates. Of course, selfish 
and nepotistic impulses were never entirely suppressed; our genetically trans- 
mitted evolved psychology shapes human cultures, and, as a result, cultural ad- 
aptations often still serve the ancient imperatives of inclusive genetic fitness. 
However, cultural evolution also creates new selective environments that build 
cultural imperatives into our genes. 

Paleoanthropologists believe that human cultures were essentially modern 
by the Upper Paleolithic, 50,000 years ago (Klein, 1999, ch. 7), if not much 
earlier (McBrearty and Brooks, 2000). Thus, even if the cultural group selection 
process began as late as the Upper Paleolithic, such social selection could easily 
have had extensive effects on the evolution of human genes through this process. 
More likely, Upper Paleolithic societies were the culmination of a long period of 
coevolutionary increases in a tendency toward tribal social life. 6 

We suppose that the resulting “tribal instincts” are something like princi- 
ples in the Chomskian linguists’ “principles and parameters” view of language 
(Pinker, 1994). Innate principles furnish people with basic predispositions, 
emotional capacities, and social dispositions that are implemented in practice 
through highly variable cultural institutions, the parameters. People are innately 
prepared to act as members of tribes, but culture tells us how to recognize who 
belongs to our tribes; what schedules of aid, praise, and punishment are due to 
tribal fellows; and how the tribe is to deal with other tribes: allies, enemies, and 
clients. The division of labor between innate and culturally acquired elements is 
poorly understood, and theory gives little guidance about the nature of the 
synergies and trade-offs that must regulate the evolution of our psychology 
(Richerson and Boyd, 2000a). The fact that human-reared apes cannot be so- 
cialized to behave like humans guarantees that some elements are innate. Con- 
trarily, the diversity and sometimes rapid change of social institutions guarantee 
that much of our social life is governed by culturally transmitted rules, skills, and 
even emotions. We beg the reader’s indulgence for the necessarily brief and as- 
sertive nature of our argument here. The rationale and ethnographic support for 
the tribal instincts hypothesis are laid out in more detail in Richerson and Boyd 
(1998, 1999); for a review of the broad spectrum of empirical evidence sup- 
porting the hypothesis, see Richerson and Boyd (2001). 

Work-around Hypothesis 

Contemporary human societies differ drastically from the societies in which our 
social instincts evolved. Pleistocene hunter-gatherer societies were comparatively 
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small, egalitarian, and lacking in powerful institutionalized leadership. By con- 
trast, modern societies are large, inegalitarian, and have coercive leadership in- 
stitutions (Boehm, 1993). If the social instincts hypothesis is correct, our innate 
social psychology furnishes the building blocks for the evolution of complex 
social systems, while simultaneously constraining the shape of these systems 
(Salter, 1995). To evolve large-scale, complex social systems, cultural evolu- 
tionary processes, driven by cultural group selection, take advantage of whatever 
support these instincts offer. For example, families willingly take on the essential 
roles of biological reproduction and primary socialization, reflecting the ancient 
and still powerful effects of selection at the individual and kin level. At the same 
time, cultural evolution must cope with a psychology evolved for life in quite dif- 
ferent sorts of societies. Appropriate larger-scale institutions must regulate the 
constant pressure from smaller groups (coalitions, cabals, cliques) to subvert 
rules favoring large groups. To do this, cultural evolution often makes use of 
“work-arounds.” It mobilizes the tribal instincts for new purposes. For example, 
large national and international (e.g., great religions) institutions develop 
ideologies of symbolically marked inclusion that often fairly successfully engage 
the tribal instincts on a much larger scale. Military and religious organizations 
(e.g., Catholic Church), for example, dress recruits in identical clothing (and 
haircuts) loaded with symbolic markings and then subdivide them into small 
groups with whom they eat and engage in long-term repeated interaction. Such 
work-arounds are often awkward compromises, as is illustrated by the existence 
of contemporary societies handicapped by narrow, destructive loyalties to small 
tribes (West, 1941) and even to families (Banfield, 1958). In military and reli- 
gious organizations, excessive within-group loyalty often subverts higher-level 
goals. If this picture of the innate constraints on current institutional evolution is 
correct, it is evidence for the existence of tribal social instincts that buttress the 
uncertain inferences from ethnography and archaeology about late Pleistocene 
societies. Complex societies are, in effect, grand natural social-psychological 
experiments that stringently test the limits of our innate dispositions to coop- 
erate. We expect the social institutions of complex societies to simulate life in 
tribal-scale societies in order to generate cooperative “lift.” We also expect that 
complex institutions will accept design compromises to achieve such “lift,” 
which would be unnecessary if innate constraints of a specifically tribal structure 
were absent. 


Coercive Dominance 

The cynics’ favorite mechanism for creating complex societies is command 
backed up by force. The conflict model of state formation has this character 
(Carneiro, 1970), as does Hardin’s (1968) recipe for commons management. 

Elements of coercive dominance are no doubt necessary to make complex 
societies work. Tribally legitimated self-help violence is a limited and expensive 
means of altruistic coercion. Complex human societies have to supplement 
the moralistic solidarity of tribal societies with formal police institutions. Oth- 
erwise, the large-scale benefits of cooperation, coordination, and division of labor 
would cease to exist in the face of selfish temptations to expropriate them by 
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individuals, nepotists, cabals of reciprocators, organized predatory bands, greedy 
capitalists, and classes or castes with special access to means of coercion. At the 
same time, the need for organized coercion as an ultimate sanction creates roles, 
classes, and subcultures with the power to turn coercion to narrow advantage. 
Social institutions of some sort must police the police so that they will act in 
the larger interest to a measurable degree. Indeed, Boehm (1993] notes that the 
egalitarian social structure of simple societies is itself an institutional achieve- 
ment by which the tendency of some to try to dominate others on the typical 
primate pattern is frustrated by the ability of the individuals who would be 
dominated to collaborate to enforce rules against dominant behavior. Such po- 
licing is never perfect and, in the worst cases, can be very poor. The fact that 
leadership in complex systems always leads to at least some economic inequality 
suggests that narrow interests, rooted in individual selfishness, kinship, and, 
often, the tribal solidarity of the elite, always exert an influence. The use of 
coercion in complex societies offers excellent examples of the imperfections in 
social arrangements traceable to the ultimately irresolvable tension of more 
narrowly selfish and more inclusively altruistic instincts. 

While coercive, exploitative elites are common enough, we suspect that no 
complex society can be based purely on coercion for two reasons: (1] coercion of 
any great mass of subordinates requires that the elite class or caste be itself a 
complex, cooperative venture; (2) defeated and exploited peoples seldom accept 
subjugation as a permanent state of affairs without costly protest. Deep feelings of 
injustice generated by manifestly inequitable social arrangements move people to 
desperate acts, driving the cost of dominance to levels that cripple societies in the 
short run and often cannot be sustained in the long run (Insko et al., 1983; 
Kennedy, 1987]. Durable conquests, such as those leading to the modern Euro- 
pean national states, Han China, or the Roman Empire, leaven raw coercion with 
other institutions. The Confucian system in China and the Roman legal system in 
the West were far more sophisticated institutions than the highly coercive sys- 
tems sometimes set up by predatory conquerors and even domestic elites. 


Segmentary Hierarchy 

Late Pleistocene societies were undoubtedly segmentary in the sense that supra- 
band ethnolinguistic units served social functions. The segmentary principle can 
serve the need for more command and control by hardening lines of authority 
without disrupting the face-to-face nature of proximal leadership in egalitarian 
societies. The Polynesian ranked lineage system illustrates how making political 
offices formally hereditary according to a kinship formula can help deepen and 
strengthen a command and control hierarchy (Kirch, 1984]. A common method 
of deepening and strengthening the hierarchy of command and control in com- 
plex societies is to construct a nested hierarchy of offices, using various mixtures 
of ascription and achievement principles to staff the offices. Each level of the 
hierarchy replicates the structure of a hunting and gathering band. A leader at 
any level interacts mainly with a few near-equals at the next level down in the 
system. New leaders are usually recruited from the ranks of subleaders, often 
tapping informal leaders at that level. As Eibl-Eibesfeldt (1989] remarks, even 
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high-ranking leaders in modern hierarchies adopt much of the humble head- 
man's deferential approach to leadership. Henrich and Gil- White’s (2001] work 
on prestige provides a coevolutionary explanation for this phenomenon. 

The hierarchical nesting of social units in complex societies gives rise to 
appreciable inefficiencies (Miller, 1992). In practice, brutal sheriffs, incompetent 
lords, venal priests, and their ilk degrade the effectiveness of social organizations 
in complex societies. Squires (1986) dissects the problems and potentials of 
modern hierarchical bureaucracies to perform consistently with leaders’ inten- 
tions. Leaders in complex societies must convey orders downward, not just seek 
consensus among their comrades. Devolving substantial leadership responsibility 
to subleaders far down the chain of command is necessary to create small-scale 
leaders with face-to-face legitimacy. However, it potentially generates great fric- 
tion if lower-level leaders either come to have different objectives than the upper 
leadership or are seen by followers as equally helpless pawns of remote leaders. 
Stratification often creates rigid boundaries so that natural leaders are denied 
promotion above a certain level, resulting in inefficient use of human resources 
and a fertile source of resentment to fuel social discontent. 

On the other hand, failure to articulate properly tribal-scale units with more 
inclusive institutions is often highly pathological. Tribal societies often must 
live with chronic insecurity due to intertribal conflicts. One of us once attended 
the Palio, a horse race in Siena in which each ward, or contrada, in this small 
Tuscan city sponsors a horse. Voluntary contributions necessary to pay the rider, 
finance the necessary bribes, and host the victory party amount to a half a million 
dollars. The contrada clearly evoke the tribal social instincts: they each have a 
totem — the dragon, the giraffe — special colors, rituals, and so on. The race excites 
a tremendous, passionate rivalry. One can easily imagine medieval Siena in 
which swords clanged and wardmen died, just as they do or did in warfare be- 
tween New Guinea tribes (Rumsey, 1999), Greek city-states (Runciman, 1998), 
inner-city street gangs (Jankowski, 1991), and ethnic militias. 


Exploitation of Symbolic Systems 

The high population density, division of labor, and improved communication 
made possible by the innovations of complex societies increased the scope for 
elaborating symbolic systems. The development of monumental architecture to 
serve mass ritual performances is one of the oldest archaeological markers of 
emerging complexity. Usually an established church or less formal ideological 
umbrella supports a complex society’s institutions. At the same time, complex 
societies exploit the symbolic ingroup instinct to delimit a quite diverse array of 
culturally defined subgroups, within which a good deal of cooperation is rou- 
tinely achieved. Ethnic group-like sentiments in military organizations are often 
most strongly reinforced at the level of 1,000-10,000 or so men (British and 
German regiments, U.S. divisions; Kellett, 1982). Typical civilian symbolically 
marked units include nations, regions (e.g., Swiss cantons), organized tribal 
elements (Garthwaite, 1993), ethnic diasporas (Curtin, 1984), castes (Srinivas, 
1962; Gadgil and Guha, 1992), large economic enterprises (Fukuyama, 1995), 
and civic organizations (Putnam, Leonardi, and Nanetti, 1993). 
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How units as large as modern nations tap into the tribal social instincts is 
an interesting issue. Anderson (1991) argues that literate communities, and the 
social organizations revolving around them (e.g., Latin literates and the Catholic 
Church), create "imagined communities,” which in turn elicit significant com- 
mitment from members of the community. Since tribal societies were often large 
enough that some members were not known personally to any given person, 
common membership would sometimes have to be established by the mutual 
discovery of shared cultural understandings, as simple as the discovery of a 
shared language in the case of the Shoshone. The advent of mass literacy and 
print media — Anderson stresses newspapers — made it possible for all speakers 
of a given vernacular to have confidence that all readers of the same or related 
newspapers share many cultural understandings, especially when organizational 
structures such as colonial government or business activities really did give 
speakers some institutions in common. Nationalist ideologists quickly discovered 
the utility of newspapers for building imagined communities, typically several 
contending variants of the community, making nations the dominant quasi-tribal 
institution in most of the modern world. 

Many problems and conflicts revolve around symbolically marked groups in 
complex societies. Official dogmas often stultify desirable innovations and lead to 
bitter conflicts with heretics. Marked subgroups often have enough tribal cohe- 
sion to organize at the expense of the larger social system. The frequent seizure of 
power by the military in states with weak institutions of civil governance is 
probably a by-product of the fact that military training and segmentation, often 
based on some form of patriotic ideology, are conducive to the formation of 
relatively effective large-scale institutions. Wherever groups of people interact 
routinely, they are liable to develop a tribal ethos. In stratified societies, powerful 
groups readily evolve self-justifying ideologies that buttress treatment of subor- 
dinate groups, ranging from neglectful to atrocious. White American Southerners 
had elaborate theories to justify slavery, and pioneers everywhere found the 
brutal suppression of Indian societies legitimate and necessary. The parties and 
interest groups that vie to sway public policy in democracies have well-developed 
rationalizations for their selfish behavior. A major difficulty with loyalties in- 
duced by appeals to shared symbolic culture is the very language-like productivity 
possible with this system. Dialect markers of social subgroups emerge rapidly 
along social fault-lines (Labov, 2001). Charismatic innovators regularly launch 
new belief and prestige systems, which sometimes make radical claims on the 
allegiance of new members, sometimes make large claims at the expense of ex- 
isting institutions, and sometimes grow explosively. Or larger loyalties can arise, 
as in the case of modern nationalisms overriding smaller-scale loyalties, some- 
times for the better, sometimes for the worse. The ongoing evolution of social 
systems can develop in unpredictable, maladaptive directions by such processes 
(Putnam, 2000). The worldwide growth of fundamentalist sects that challenge 
the institutions of modern states is a contemporary example (Marty and Appleby, 
1991). If T. Wolfe (1965) is right, mass media can be the basis of a rich diversity 
of imagined subcommunities using such vehicles as specialized magazines, 
newsletters, and websites. The potential of deviant subgroups, such as sectarian 
terrorist organizations, to use modern media to create small but highly motivated 
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imagined communities is an interesting variant on Anderson’s theory. Ongoing 
cultural evolution is impossible to control wholly in the larger interest, at least 
impossible to control completely, and forbidding free evolution tends to deprive 
societies of the “civic culture” that spontaneously produces so many collective 
benefits. 


Legitimate Institutions 

In small-scale egalitarian societies, individuals have substantial autonomy, con- 
siderable voice in community affairs, and can enforce fair, responsive — even self- 
effacing — behavior and treatment from leaders (Boehm, 1999}. At their most 
functional, symbolic institutions, a regime of tolerably fair laws and customs, 
effective leadership, and smooth articulation of social segments can roughly 
simulate these conditions in complex societies. Rationally administered bu- 
reaucracies, lively markets, the protection of socially beneficial property rights, 
widespread participation in public affairs, and the like provide public and private 
goods efficiently, along with a considerable amount of individual autonomy. 
Many individuals in modern societies feel themselves part of culturally labeled 
tribal-scale groups, such as local political party organizations, that have influence 
on the remotest leaders. In older complex societies, village councils, local no- 
tables, tribal chieftains, or religious leaders often hold courts open to humble 
petitioners. These local leaders, in turn, represent their communities to higher 
authorities. To obtain low-cost compliance with management decisions, ruling 
elites have to convince citizens that these decisions are in the interest of the 
larger community. As long as most individuals trust that existing institutions are 
reasonably legitimate and that any felt needs for reform are achievable by means 
of ordinary political activities, there is considerable scope for large-scale col- 
lective social action. 

Legitimate institutions, however, and trust of them, are the result of an 
evolutionary history and are neither easy to manage or engineer. Social distance 
between different classes, castes, occupational groups, and regions is objectively 
great. Narrowly interested tribal-scale institutions abound in such societies. 
Some of these groups have access to sources of power that they are tempted to 
use for parochial ends. Such groups include, but are not restricted to, elites. The 
police may abuse their power. Petty administrators may victimize ordinary 
citizens and cheat their bosses. Ethnic political machines may evict historic elites 
from office but use chicanery to avoid enlarging their coalition. 

Without trust in institutions, conflict replaces cooperation along fault lines 
where trust breaks down. Empirically, the limits of the trusting community de- 
fine the universe of easy cooperation (Fukuyama, 1995}. At worst, trust does not 
extend outside family (Banfield, 1958}, and potential for cooperation on a larger 
scale is almost entirely foregone. Such communities are unhappy as well as poor. 
Trust varies considerably in complex societies, and variation in trust seems to be 
the main cause of differences in happiness across societies (Inglehart and Rabier, 
1986}. Even the most efficient legitimate institutions are prey to manipulation 
by small-scale organizations and cabals, the so-called special interests of mod- 
ern democracies. Putnam et al.’s (1993} contrast between civic institutions in 
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Northern and Southern Italy illustrates the difference that a tradition of func- 
tional institutions can make. The democratic form of the state, pioneered by 
Western Europeans in the last couple of centuries, is a powerful means of creating 
generally legitimate institutions. Success attracts imitation all around the world. 
The halting growth of the democratic state in countries ranging from Germany to 
sub-Saharan Africa is testimony that legitimate institutions cannot be drummed 
up out of the ground just by adopting a constitution. Where democracy has taken 
root outside of the European cultural orbit, it is distinctively fitted to the new 
cultural milieu, as in India and Japan. 


Conclusion 

The processes of cultural evolution quite plausibly led to group selection being a 
more powerful force on cultural rather than genetic variation. The cultural system 
of inheritance probably arose in the human lineage as an adaptation to the in- 
creasingly variable environments of the recent past (Richerson and Boyd, 2000a, 
b). Theoretical models show that the specific structural features of cultural sys- 
tems, such as conformist transmission, have ordinary adaptive advantages. We 
imagine that these adaptive advantages favored the capacity for a system that 
could respond rapidly and flexibly to environmental variation in an ancestral 
creature that was not particularly cooperative. As a by-product, cultural evolution 
happened to favor large-scale cooperation. Over a long period of coevolution, 
cultural pressures reshaped “human nature,” giving rise to innate adaptations to 
living in tribal-scale social systems. Humans became prepared to use systems of 
legitimate punishment to lower the fitness of deviants, for example. We believe 
that the cultural explanation for human cooperation is in accord with much ev- 
idence, as summarized by stylized facts about human cooperation with which we 
introduced our remarks. More detailed surveys of the concordance of our con- 
jectures with various bodies of data may be found in Richerson and Boyd (1999, 
2001) and Richerson, Boyd, and Paciotti (2002). 

Regardless of the fate of any particular proposals, we think that explanations 
of human cooperation have to thread some rather tight constraints. They have to 
somehow finesse the awkward fact that humans, at least partly because of our 
ability to cooperate with distantly related people in large groups, are a huge 
success yet quite unique in our style of social life. If a mechanism like indirect 
reciprocity works, why have not many social species used it to extend their range 
of cooperation? If finding self-reinforcing solutions to coordination games is mostly 
what human societies are about, why do not other animals have massive coordi- 
nation-based social systems? If reputations for pairwise cooperation are easy to 
observe or signal (but unexploitable by deceptive defectors), why have we found 
no other complex animal societies based on this principle? By contrast, we do find 
plenty of complex animal societies built on the principle of inclusive fitness. 

The unique pattern of cooperation of our species suggests that human co- 
operation is likely to derive from some other unique feature or features of human 
life. Advanced capacities for social learning are also unique to humans; thus, 
culture is, prima facie, a plausible key element in the evolution of human 
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cooperation. Our argument depends upon the existence of culture and group 
selection on cultural variation. Since sophisticated culture is unique to humans, 
we do not expect this mechanism to operate in other species. Ours is not the only 
hypothesis that passes this basic test. For example, E. Smith’s (2003) signaling 
hypothesis depends upon language, another unique feature of the human species. 
E. Hagen made a similar proposal in his comment on our background paper. He 
argued that the inventiveness of humans combined with language as a cheap 
communication device adapts us to solve problems of cooperation. We think that 
hypotheses in this vein, like Alexander’s proposed indirect reciprocity mecha- 
nism, cannot be decisively rejected, but they are far from completely specified. 
What is it that biases invention and cheap talk in favor of cooperative rather than 
selfish ends? The intuition that cheap talk, symbolic rewards, and clever institutions 
are in themselves sufficient to explain human cooperation probably comes from the 
common experience that people do find it rather easy to use such devices to 
cooperate (e.g., Ostrom, Gardner, and Walker, 1994). The difficult question is 
whether these are backed up by unselfish motives on the part of at least some 
people. A literal interpretation of experiments such as those of Fehr and Gachter 
(2002) and Batson (1991) suggests that unselfish motives play important roles. 
However, unselfish motives may be a proximal evolutionary result of an ultimate 
indirect reciprocity sort of evolutionary process rather than the result of a group 
selection mechanism. Those who attempt deception in a world of clever co- 
operators may simply expose their lack of cleverness, so that the best strategy is an 
unfeigned willingness to cooperate. The data that cultural group selection is an ap- 
preciable process (Soltis et al., 1995) are also not definitive, since they could be 
weak relative to some competing process of the indirect reciprocity sort. 

Another complication is that hypotheses leaning on language, technology, 
and intelligence are appealing to phenomena with considerable cultural content. 
The evolution of technology and the diffusion of innovations are cultural pro- 
cesses that depend upon institutions and a sophisticated social psychology 
(Henrich, 2001). Both the cultural and genetic evolution of our cognitive ca- 
pacities (some of which gave rise to language) likely emerged from a culture- 
gene coevolutionary process (Henrich and McElreath, 2002; Tomasello, 1999). 
Thus, these hypotheses are not, we submit, clean alternatives to the cultural 
group selection hypothesis, absent further specification. In the future, we expect 
that competing hypotheses will be developed in sufficient detail that more 
precise comparative empirical tests will be possible. 

For example, even if innatist linguists are correct that much of what we need 
to know to speak is innate, we wonder why more is not innate? Why is it that 
mutually unintelligible languages arise so rapidly? Would not we be better off if 
everyone spoke the same common entirely innate language? Not necessarily. Very 
often people from distant places are likely to have evolved different ways of doing 
things that are adaptive at home but not abroad. Similarly, avoiding listening to 
people is a wise idea if they are proposing a behavior deviant from locally pre- 
vailing coordination equilibria. Cultural evolution can run up adaptive barriers to 
communication quite readily if listening to foreigners makes one liable to acquire 
erroneous ideas (McElreath, Boyd, and Richerson 2003) . Dialect evolution seems 
to be a highly nuanced system for regulating communication within languages 
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as well as between them, although the adaptive significance of dialect is hardly 
well worked out (Laboy, 2001). Interestingly, in McElreath et al.’s model, using a 
symbolic signal to express a willingness to cooperate cannot support the evolution 
of a symbolic marker of group membership because defectors as well as potential 
cooperators will be attracted by the signal. A symbolic system can be used to 
communicate intention to cooperate only if potential cooperative partners can 
exchange trustworthy signals. Once symbolic markers became sufficiently com- 
plex as to be unfakable by defectors and a sufficiently large pool of relatively 
anonymous but trustworthy signalers exist, then cheap signals will be useful. 
Dialect is difficult to fake although cheap to use, and once some level of coop- 
eration on a proto-tribal scale was possible, proto-languages might have come 
under selection to create unfakable signals of group membership that imply an 
intention to cooperate. We suspect that language could have evolved only in 
concert with a measure of trust of other speakers rather than being an unaided 
generator of trust. To the extent that cooperation is the game, one has no interest 
in listening to speakers whose messages are self-serving. Think of how annoying 
we find telemarketer’s speech acts. Sociolinguists make much of the concept that 
speech is a cooperative system and argue that the empirical structure of con- 
versation is consistent with this assumption [Wardhaugh, 1992). Language seems 
to presuppose cooperation as much as it in turn facilitates cooperation. 

That technology, like language, is one of the major components of the hu- 
man adaptation is undeniable. It opens up opportunities to gain advantage to 
cooperation in hunting and defense and to exploit the possibilities of the division 
of labor. What is less well understood is the extent to which technology is likely 
a product of large-scale social systems. Henrich (2004b) has analyzed models of 
the “Tasmanian Effect.” At the time of European contact, the Tasmanians had 
the simplest toolkit ever recorded in an extant human society; it was, for ex- 
ample, substantially simpler than the toolkits of ethnographically known foragers 
in the Kalahari and Tierra del Fuego, as well as those associated with human 
groups from the Upper Paleolithic. Archaeological evidence indicates that 
Tasmanian simplicity resulted from both the gradual loss of items from their 
own pre-Holocene toolkit and the failure to develop many of the technologies 
that subsequently arose only 150 km to the north in Australia. The loss likely 
began after the Bass Strait was flooded by rising post-glacial sea levels (Jones, 
1995). Henrich's analysis indicates that imperfect inference during social 
learning, rather than stochastic loss due to drift-like effects, is the most likely 
reason for this loss. This suggests that to maintain an equilibrium toolkit as com- 
plex as those of late Pleistocene hunter-gatherers likely required a rather large 
population of people who interacted fairly freely so that rare, highly skilled 
performances, spread by selective imitation, could compensate for the routine 
loss of skills due to imperfect inference. Neanderthals and perhaps other archaic 
human populations had large brains but simple toolkits. The Tasmanian effect 
may explain why. Archaeology suggests that Neanderthal population densities 
were lower than those of the modern humans that replaced them in Europe and 
that they had less routine contact with their neighbors, as evidenced by shorter 
distance movement of high-quality raw materials from their sources compared 
to those for modern humans (Klein, 1999). 
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The proposal that human intelligence is at the root of human cooperation is 
difficult to evaluate because of the ambiguity in what we might mean by intelli- 
gence in a comparative context (Hinde, 1970:659-663], As the Tasmanian Effect 
illustrates, individual human intelligence is only a part, and perhaps only a small 
part, of being able to create complex adaptive behaviors. In fact, we think “intel- 
ligence” plays little role in the emergence of many human complex adaptations. 
Instead, humans seem to depend upon socially learned strategies to finesse the 
shortcomings of their cognitive capabilities (Nisbett and Ross, 1980]. The details 
of human cognitive abilities apparently vary substantially across cultures because 
culturally transmitted cognitive styles differ (Nisbett et al., 2001]. Although we 
share the common intuition that humans are individually more intelligent than 
even our very clever fellow apes, we are not aware of any experiments that suffi- 
ciently control for our cultural repertoires to be sure that it is correct. The concept 
of “intelligence” in individual humans perhaps makes little sense apart from their 
cultural repertoires: humans are smart in part because they can bring a variety of 
“cultural tools” (e.g., numbers, symbols, maps, various kinematic models] to bear 
on problems. A hunter-gatherer would seem an incredibly stupid college professor, 
but college professors would seem equally dense if forced to try to survive as 
hunter-gatherers (a few knowledgeable anthropologists aside]. Even abilities as 
seemingly basic as those related directly to visual perception vary across cultures 
(Segall, Campbell, and Herskovits, 1966]. Second, intelligence implies a means to an 
end, not an end in itself. Individual intelligence ought to serve the ends of both 
cooperation and defection. We suspect that actually defection, requiring trickery 
and deception, is better served by intelligence than cooperation. Game theorists 
assuming perfect, but selfish, rationality predict that humans should defect in the 
one-shot anonymous prisoner’s dilemma, just as evolutionary biologists predict 
that dumb beasts using evolved predispositions will. Whiten and Bryne (1988] 
characterized our social intelligence as “Machiavellian,” implying that it does in- 
deed serve deception equally with honesty. However, just as humans punish al- 
truistically, they seem also to exert their political intelligence altruistically (e.g., 
Sears and Funk, 1990], biasing the evolution of institutions accordingly. On the 
basis of our brain size compared to that of other apes, Dunbar (1992] predicts that 
human groups ought to number around 50. Hunter-gatherer co-residential bands 
do number around 50, but culturally transmitted institutions web together bands to 
create tribes typically numbering a few hundred to a few thousand people, as we 
have seen. Human political systems do seem to exceed in scale anything predicted 
on the basis of enhanced Machiavellian talents (supposing that such talents can on 
average increase social scale at all]. The institutional basis of these systems is not far 
to seek. For example, Wiessner (1984] describes how institutions of ceremonial 
exchange of gifts knit the famous IKung San bands into a much larger-risk pooling 
cooperative. Australian aboriginal groups show similar functional patterns, which 
are built out of quite different and substantially more elaborate sets of cultural 
practices (Peterson, 1979], Underpinning such individual-to-individual bond 
making is likely the kind of generalized trust that co-ethnics have for one another. If 
Murphy and Murphy (1986] are correct about the Northern Shoshone, a society of 
thousands constituted a functional “people” engaging in mutual aid in a hostile and 
uncertain environment on the basis of little more than a common language. In his 
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classic ethnography of the Nuer, Evans-Pritchard (1940) describes how simple 
tribal institutions can knit herding people into tribes numbering tens of thousands, 
much larger than was possible among hunter-gathers. The size of hunter-gatherer 
societies was evidently limited by low population density, not by their relatively 
unsophisticated institutions. Third, Henrich and Gil- White (2001) propose that 
human prestige systems are an adaptation to facilitate cultural transmission. Social 
learning means that the returns to effort in individual learning potentially result in 
gains for many subsequent social learners who do not have to “reinvent the wheel.” 
If extra individual effort in acquiring better ideas pays off in prestige and if prestige 
leads to fitness advantages, then the social returns to effortful individual learning 
will in part be reflected in private returns to individual learners. Group selection on 
prestige systems may further enlarge the returns to investment in individual 
learning and bring returns up to a level that reflects the group optimum amount of 
effort in individual learning. If this mechanism operates, human intelligence may 
have been enhanced by social selection emanating from institutions of prestige. 7 

We propose that group selection on cultural variation is at the heart of 
human cooperation, but we certainly recognize that our sociality is a complex 
system that includes many linked components. Surely, without punishment, 
language, technology, individual intelligence and inventiveness, ready establish- 
ment of reciprocal arrangements, prestige systems, and solutions to games of 
coordination, our societies would take on a distinctly different cast, to say the 
least. Human sociality no doubt has a number of components that were neces- 
sary to its evolution and are necessary to its current functions. If such is the case, 
prime mover explanations giving pride of place to a single mechanism are vain to 
seek. Thus, a major constraint on explanations of human sociality is its systemic 
structure. Explanations have to have a plausible historical sequence tracing how 
the currently interrelated parts evolved, perhaps piecemeal. And explanations 
have to account for the current functional and dysfunctional properties of hu- 
man social systems. We are far from having completed this task. 


NOTES 

1. “Cooperation” has a broad and a narrow definition. The broad definition 
includes all forms of mutually beneficial joint action by two or more individuals. The 
narrow definition is restricted to situations in which joint action poses a dilemma for 
at least one individual such that, at least in the short run, that individual would be 
better off not cooperating. We employ the narrow definition in this chapter. The 
“cooperate” versus “defect” strategies in the prisoner’s dilemma and commons 
games anchor our concept of cooperation, making it more or less equivalent to the 
term “altruism” in evolutionary biology. Thus, we distinguish “coordination” (joint 
interactions that are “self-policing” because payoffs are highest if everyone does the 
same thing) and division of labor (joint action in which payoffs are highest if in- 
dividuals do different things) from cooperation. 

2. We refer to cultural evolution as changes in the pool of cultural variants 
carried by a population of individuals as a function of time and the processes that 
cause the changes. 
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3. It is not obvious that language potentiates indirect reciprocity. Whereas su- 
perficially language may seem to promote the exchange of high-quality information 
required for indirect reciprocity to favor cooperation, this addition merely changes the 
question slightly to one of why individuals would cooperate in information sharing; 
language merely recreates the same public goods dilemma. Lies about hunting success, 
for example, are difficult to check and often ambiguous. Among the Gunwinggu 
(Australian foragers), members of one band often lied to members of other bands 
about their success to avoid having to share meat (Altman and Peterson, 1988). 

4. Several prominent modern Darwinians, Hamilton (1975), Wilson (1975: 
561-562), Alexander (1987:169), and Eibl-Eibesfeldt (1982), have given serious 
consideration to group selection as a force in the special case of human ultra-sociality. 
They are impressed, as we are, by the organization of human populations into units 
that engage in sustained, lethal combat with other groups, not to mention other 
forms of cooperation. The trouble with a straightforward group selection hypothesis 
is our mating system. We do not build up concentrations of intrademic relatedness 
like social insects, and few demic boundaries are without considerable intermarriage. 
Moreover, the details of human combat are more lethal to the hypothesis of genetic 
group selection than to the human participants. For some of the most violent groups 
among simple societies, wife capture is one of the main motives for raids on 
neighbors, a process that could hardly be better designed to erase genetic variation 
between groups and stifle genetic group selection. 

5. We are aware that much controversy surrounds the use of microevolutionary 
models to explain macroevolutionary questions. Our thoughts on the issues are 
summarized in Boyd and Richerson (1992a). 

6. It would be a mistake to assume that complex technology is a prerequisite for 
tribal-level forms of social organization. At the time of European discovery, the 
Tasmanians had a technology substantially simpler than that of many Upper Paleo- 
lithic peoples: they lacked bone tools, composite spears, bows, arrows, spear 
throwers, and fish hooks, etc. Yet they lived in multiband groups, which controlled 
territories. Intertribal trade, warfare, and raiding were all commonplace (Jones, 
1995). The last 4,000 years of the Tasmania archaeological record do not look much 
different from many middle Paleolithic sites. 

7. Similarly, as Smith (2003) notes, Hawkes hypothesizes that men contribute 
to hunting success to “show off’ and that showing off earns men reproductive 
success in terms of sexual favors from women. Contrary to what Hawkes supposes, 
this system is a possible focus of cultural group selection. In many hunter-gatherer 
groups, meat is very widely shared and hunters often do not control its distribution. 
Personal favors granted to a successful hunter as recompense for effort will benefit all 
who share his kills. Showing that individuals who contribute heavily to the common 
good are rewarded is not evidence that group-selected effects are absent. In the end, 
group selection can succeed only if altruistic individuals on average do better than 
selfish ones. The fact that hunters are not allowed to bargain with consumers of their 
kills and yet are rewarded by consumers anyway is at least as consistent with the 
operation of group selection as with a competing individualist explanation. 
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PART 4 

Archaeology and 
Culture History 


Historians and scientists do not always get along very well. Many 
historians view science as a procrustean enterprise whose practitioners insist 
on shoehorning complex historical phenomena into overly simple general 
laws. For their part, scientists often think that historians exaggerate the 
complexity and contingent nature of historical events, willfully refusing to see 
the order that underlies chaos of one thing after another. This debate is 
echoed in evolutionary biology where Steven J. Gould famously upheld a 
historicist version of organic evolution, a habit that made many mainstream 
evolutionary biologists hopping mad. 

In our view, these debates are rooted in a mistaken view of evolutionary 
theory. Surely historical contingency plays a role in every sort of evolution from 
the cosmic to the cultural. The Big Bang was a singular event. So was the 
evolution of our unique species (and every other unique species, for that 
matter). However, evolutionary scientists do not try to jam this complexity 
into the straitjacket of general laws like those in physics. Instead, they aim to 
develop a toolkit of models and a collection of related empirical generalizations. 
The phenomena of evolution are not only complex but also diverse. No 
model and no empirical generalization is guaranteed to hold from one case 
to the next. Yet the lesson of biology is that this piecemeal approach to theory 
can yield deep insights. In chapter 19, we review the case for using a toolkit 
of simple models to explain complex and diverse phenomena like cultural 
evolution. Here we consider the role of theory-as-tools in understanding 
phenomena in which historical contingency plays a large, if not dominant, role. 

Chapter 1 5 discusses why evolutionary processes give rise to history — 
meaning patterns of change with time in which the same initial conditions 
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result in divergent evolutionary trajectories or in which change is nonsta- 
tionary. The very simplest evolutionary models of adaptation by natural 
selection give rise to trajectories that converge on unique equilibria from 
divergent initial conditions. Add simple noise or simple oscillatory mechan- 
isms in key processes and the change will never cease. But it is stationary and 
thus will remain “lawful.” However, real evolutionary trajectories do diverge 
from identical starting points and do result in patterns whose statistical 
properties are not stationary, and this fact limits the predictive power of 
evolutionary theory. The “laws” of nature are, in effect, ever-changing. In this 
chapter, we suggest a number of means by which rather straightforward 
adaptive processes can result in divergent, nonstationary change. If the argu- 
ment is correct, the scientists’ tools should prove quite useful to historians 
even if what we provide is not laws. Just demonstrating how divergence and 
nonstationarity themselves arise shows how the scientific approach can illu- 
minate historical questions at the most fundamental level. 

In chapter 1 6 we consider the problem of constructing cultural phylogenies. 
Phylogenies are useful, among other things, for controlling for the effects of 
common history in scientific studies of organic and cultural evolution. In recent 
years, evolutionary biologists have made great technical strides in the science of 
phylogeny reconstruction, and these advances have promise for application to 
cultures. The difficulty is that cultures do not have the simple branching histories 
that characterize most biological species — cross-cultural diffusion occurs in 
every domain of culture. Whether this fact causes important problems for 
phylogenetic reconstruction is an open question. Historical linguists have long 
struggled with this problem with some success. Language trees are a much used 
starting point for cultural phylogeny reconstruction, despite their obvious 
limitations. In places like aboriginal western North America, groups with 
unrelated languages often have very similar subsistence systems and even similar 
political and social organization. Even the most conservative features of language 
change rapidly so that most historical linguists believe that phylogenetic 
reconstruction is possible only for the last few thousand years. Another approach 
is to consider the phylogeny of single traits or small, tightly knit clusters of traits 
rather than of cultures as a whole. However, such items may contain too little 
historical information for accurate reconstruction. Future methodological 
innovations may solve many of these problems. In the meantime, the difficulty of 
cultural phylogeny reconstruction illustrates an important point. Humans are 
one species; our genes and our culture tend to diffuse very widely. Local 
populations are seldom if ever isolated for any substantial length of time. 
Ideologues often want to use the concept of culture like the concept of race, 
imagining that their culture has a “pure” history. In fact, all cultures have 
tangled, messy histories, even messier than our genes, if that is possible. 

Chapter 1 7 deals with a specific historical problem, the origins of agri- 
culture. This phenomenon is typical of a number of problems in human 
evolution in that it is a particular nonstationary pattern: it is “progressive.” 
Human technology and probably human social complexity have increased 
more or less steadily, if at greatly different rates, seemingly since our lineage 
branched from that of the other apes. The progressive pattern is especially 
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marked during the last 250,000 years or so (McBrearty and Brooks, 2000). 
Many scholars are not puzzled by such patterns. To them, the obvious expla- 
nation is that evolution is the process of replacing antique, less adaptive traits 
with modern, more adaptive ones. The problem is that selective processes 
usually reach equilibria too rapidly to generate long-run progress on geological 
timescales. Evolution can produce steady progress only if the processes internal 
to the evolutionary process slow it down or if the pace of evolution is set by 
external environmental factors. The origin and spread of agriculture provides 
an interesting test case because it is among the most important events in hu- 
man history, serving, as it still does, as the subsistence basis for the evolution of 
even more complex societies in the last few thousand years. Recently, the 
most popular explanations have been based on population pressure, the idea 
that humans turned to agriculture when population densities rose to the point 
that less intensive hunting and gathering techniques began to favor investment 
in agricultural production. In this chapter, we argue that population pressure 
acts at far too short a timescale to explain agricultural origins. As Malthus 
noted, population pressure builds appreciably on the generational timescale; if 
it paced cultural evolution, events would transpire at a much faster pace 
than archaeologists normally observe. Climate change is a better candidate to 
explain why agriculture first began appearing about 1 1,600 years ago. Recent 
advances in paleoclimatology have shown that last-glacial climates were ex- 
ceedingly variable compared to the period since 11,600 years ago. Climates 
in the last glacial age were also mainly drier than modern ones and lower CO 2 
may also have handicapped plant production. Agricultural subsistence is dif- 
ficult in modern climates and takes several thousand years to evolve. Perhaps 
agriculture was impossible in the Pleistocene epoch. 

Our main objective in this section is not to push particular answers to 
particular historical, archaeological, and paleoanthropological problems 
(Richerson and Boyd, 2001). Rather, we want to advertise to those who 
study historical problems that cultural evolutionary theory has tools that 
students of these phenomena need in their repertoire. Even when we can- 
not say much about how evolution works, we can often use a combination 
of theory and empiricism to estimate the rates of change characteristic of 
different processes. Quite elementary considerations can sometimes rule some 
processes in and some out as candidate explanations for a given event. Just as 
astronomers need the theory of nuclear physics to understand how stars 
evolve, so historians, archaeologists, and paleoanthropologists need the theory 
of cultural evolution to understand human evolutionary history. 
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1 5 How Microevolutionary 
Processes Give Rise 
to History 


Over the last decade a number of authors, including ourselves, have 
attempted to understand human cultural variation using Darwinian methods. 
This work is unified by the idea that culture is a system of inheritance: in- 
dividuals vary in their skills, habits, beliefs, values, and attitudes, and these 
variations are transmitted to others through time by teaching, imitation, and 
other forms of social learning. To understand cultural change, we must account 
for the microevolutionary processes that increase the numbers of some cultural 
variants and reduce the numbers of others. 

Social scientists have made a number of objections to this approach to 
understanding cultural change. Among these is the idea that culture can only be 
explained historically. Because the history of any given human society is a se- 
quence of unique and contingent events, explanations of human social life, it is 
argued, are necessarily interpretive and particularistic. Present phenomena are 
best explained mainly in terms of past contingencies, not ahistorical adaptive 
processes that would erase the trace of history. Like other scientific (rather than 
historical] explanations of human cultures, the argument goes, Darwinian 
models cannot account for the lack of correlation of environmental and cultural 
variation, nor the long-term trends in cultural change. 

In this chapter, we defend the Darwinian theories of cultural change against 
this objection by suggesting that several cultural evolutionary processes can give 
rise to divergent evolutionary developments, secular trends, and other features 
that can generate unique historical sequences for particular societies. We also 
argue that Darwinian theory offers useful tools for those interested in under- 
standing the evolution of particular societies. Essentially similar processes act in 
the case of organic evolution. Darwinian theory is both scientific and historical. 
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The history of any evolving lineage or culture is a sequence of unique, contingent 
events. Similar environments often give rise to different evolutionary trajecto- 
ries, even among initially similar taxa or societies, and some show very long-run 
trends in features such as size. Nonetheless, these historical features of organic 
and cultural evolution can result from a few simple microevolutionary processes. 

A proper understanding of the relationship between the historical and the 
scientific is important for progress in the social and biological sciences. There is 
(or ought to be] an intimate interplay between the study of the unique events of 
given historical sequences and the generalizations about process constructed by 
studying many cases in a comparative and synthetic framework. The study of 
unique cases furnishes the data from which generalizations are derived, while the 
generalizations allow us to understand better the processes that operated on 
particular historical trajectories. We cannot neglect the close, critical study of 
particular cases without putting the database for generalization in jeopardy. 
Besides, we often have legitimate reasons to be curious about exactly how 
particular historical sequences, such as the evolution of Homo sapiens, occurred. 
On the other hand, it is from the study of many cases that we form a body of 
theory about evolutionary processes. No one historical trajectory contains enough 
information to obtain a very good grasp of the processes that affected its own 
evolution. Data are missing because the record is imperfect. The lineage may be 
extinct, and so direct observation is impossible. Even if the lineage is extant, 
experimentation may be impossible for practical or ethical reasons. Potential 
causal variables may be correlated in particular cases, so understanding their 
behavior may be impossible. The comparative method can often clarify such 
cases. “Scientists” need “historians” and vice versa. 


Darwinian Models of Cultural Evolution 

Over the past two decades, a number of scholars have attempted to understand 
the processes of cultural evolution in Darwinian terms. Social scientists (Camp- 
bell, 1965, 1975; Cloak, 1975; Durham, 1976; Ruyle, 1973) have argued that the 
analogy between genetic and cultural transmission is the best basis for a general 
theory of culture. Several biologists have considered how culturally transmitted 
behavior fits into the framework of neo-Darwinism (Pulliam and Dunford, 1980; 
Lumsden and Wilson, 1981; Boyd and Richerson, 1985; Richerson and Boyd, 
1989b; Cavalli-Sforza and Feldman, 1983; Rogers, 1989). Other biologists and 
psychologists have used the formal similarities between genetic and cultural 
transmission to develop theories describing the dynamics of cultural transmission 
(Cavalli-Sforza and Feldman, 1973, 1981; Cloninger, Rice, and Reich, 1979; 
Eaves, Last, Young, and Martin, 1978). All of these authors are interested in a 
synthetic theory of process applying to how culture works in all cultures, includ- 
ing in other species that might have systems with a useful similarity to human 
culture. Note that this last broadly comparative concern is likely to be useful in 
dissecting the reasons why the human lineage originally became more cultural 
than typical mammals. 1 
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The idea that unifies the Darwinian approach is that culture constitutes a 
system of inheritance. People acquire skills, beliefs, attitudes, and values from 
others by imitation and enculturation (social learning), and these "cultural 
variants,” together with their genotypes and environments, determine their 
behavior. Since determinants of behavior are communicated from one person to 
another, individuals sample from and contribute to a collective pool of ideas that 
changes over time. In other words, cultures have similar population-level 
properties as gene pools, as different as the two systems of inheritance are in the 
details of how they work. (In one respect, the Darwinian study of cultural 
evolution is more Darwinian than the modern theory of organic evolution. 
Darwin not only used a notion of “inherited habits” that is much like the 
modern concept of culture but also thought that organic evolution generally 
included the property of the inheritance of acquired variation, which culture 
does and genes do not.) 

Because cultural change is a population process, it can be studied using 
Darwinian methods. To understand why people behave as they do in a particular 
environment, we must know the nature of the skills, beliefs, attitudes, and values 
that they have acquired from others by cultural inheritance. To do this, we must 
account for the processes that affect cultural variation as individuals acquire 
cultural traits, use the acquired information to guide behavior, and act as models 
for others. What processes increase or decrease the proportion of people in a 
society who hold particular ideas about how to behave? We thus seek to un- 
derstand the cultural analogs of the forces of natural selection, mutation, and 
drift that drive genetic evolution. We divide these forces into three classes: 
random forces, natural selection, and the decision-making forces. 

Random forces are the cultural analogs of mutation and drift in genetic 
transmission. Intuitively, it seems likely that random errors, individual idiosyn- 
crasies, and chance transmission play a role in behavior and social learning. For 
example, linguists have documented a good deal of individual variation in 
speech, some of which is probably random individual variation (Labov, 1972). 
Similarly, small human populations might well lose rare skills or knowledge by 
chance, for example, due to the premature deaths of the only individuals who 
acquired them (Diamond, 1978). 

Natural selection may operate directly on cultural variation. Selection is an 
extremely general evolutionary process (Campbell, 1965). Darwin formulated a 
clear statement of natural selection without a correct understanding of genetic 
inheritance because it is a force that will operate on any system of inheritance 
with a few key properties. There must be heritable variation, the variants must 
affect phenotype, and the phenotypic differences must affect individuals’ chances 
of transmitting the variants they carry. That variants are transmitted by imitation 
rather than sexual or asexual reproduction does not affect the basic argument, 
nor does the possibility that the source of variation is not random. Darwin 
imagined that random variation, acquired variation, and natural selection all 
acted together as forces in organic evolution. In the case of cultural evolution, 
this seems to be the case. It may well be, however, that behavioral variants 
favored by natural selection depend on the mode of transmission. The behaviors 
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that maximize numbers of offspring may not be the same as those that maximize 
cultural influence on future generations (Boyd and Richerson, 1985}. 

Decision-making forces result when individuals evaluate alternative behav- 
ioral variants and preferentially adopt some variants relative to others. If many of 
the individuals in a population make similar decisions about variants, especially 
if similar decisions are made for a number of generations, the pool of cultural 
variants can be transformed. Naive individuals may be exposed to a variety of 
models and preferentially imitate some rather than others. We call this force 
biased transmission. Alternatively, individuals may modify existing behaviors or 
invent new ones by individual learning. If the modified behavior is then trans- 
mitted, the resulting force is much like the guided, nonrandom variation of 
“Lamarckian” evolution. Put differently, humans are embedded in a complex 
social network through which they actively participate in the creation and per- 
petuation of their culture. 

The decision-making forces are derived forces (Campbell, 1965). Decisions 
require rules for making them, and ultimately the rules must derive from the 
action of other forces. That is, if individual decisions are not to be random, there 
must be some sense of psychological reward or similar process that causes 
individual decisions to be predictable, in given environments, at least. These 
decision-making rules may be acquired during an earlier episode of cultural 
transmission, or they may be genetically transmitted traits that control the 
neurological machinery for acquisition and retention of cultural traits. The latter 
possibility is the basis of the sociobiological hypotheses about cultural evolution 
(Alexander, 1979; Lumsden and Wilson, 1981). Some authors argue that the 
course of cultural evolution is determined by natural selection operating indi- 
rectly on cultural variation through the decision-making forces. 

Like natural selection, the decision-making forces may improve the fit of the 
population to the environment. The criteria of fit depend on the nature of the 
underlying decision rules. This is easiest to see when the goals of the decision 
rules are closely correlated with fitness. If human foraging practices are adopted 
or rejected according to their energy payoff per unit time (optimal foraging 
theory’s operational proxy for fitness), then the foraging practices used in the 
population will adapt to changing environments much as if natural selection 
were responsible. If the adoption of foraging practices is strongly affected by 
consideration of prestige, say, that associated with male success hunting dan- 
gerous prey, then the resulting pattern of behavior may be different. However, 
there will still be a pattern of adaptation to different environments but now in 
the sense of increasing prestige rather than calories. 


What Makes Change Historical? 

It has often been argued that historical scientific explanations are different in 
kind. Ingold (1985) gives two important versions of this argument. Some authors 
(e.g., Collingwood, 1946) argue that history is uniquely human because it entails 
conscious perception of the past. The second view (e.g., Trigger, 1978) is quite 
different and holds that history involves unique, contingent pathways from the 
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past to the future that are strongly influenced by unpredictable, chance events. 
We focus on the latter view here. For example, capitalism arose in Europe rather 
than China, perhaps because medieval and early modern statesmen failed to 
create a unified empire in the West (McNeill, 1980], and marsupials dominate 
the Australian fauna perhaps because of Australia’s isolation from other con- 
tinents in which placental mammals chanced to arise. In contrast, it is argued, 
scientific explanations involve universally applicable laws. In evolutionary biol- 
ogy and in anthropology, these often take the form of functional explanations, in 
which only knowledge of present circumstances and general physical laws (e.g., 
the principles of mechanics] are necessary to explain present behavior (Mitchell 
and Valone, 1990]. For example, long fallow horticulture is associated with 
tropical forest environments, perhaps because it is the most efficient subsistence 
technology in such environments (Conklin, 1969]. 

It has been argued, perhaps nearly as often, that this dichotomy is false. 
Eldredge (1989:9] provides a particularly clear and forceful example of a com- 
mon objection: all material entities have properties that can change through time. 
Even simple entities like molecules are characterized by position, momentum, 
charge, and so on. If we could follow a particular water molecule, we would see 
that these properties changed through time — even the water molecule has a his- 
tory, according to Eldredge. Yet everyone agrees that we can achieve a satisfactory 
scientific theory of water. Historical explanations, Eldredge argues, are just sci- 
entific explanations applied to systems that change through time. We are misled 
because chemists tend to study the average properties of very large numbers of 
water molecules. 

This argument explains too much. Not all change with time is history in the 
sense intended by historically oriented biologists and social scientists. To see this, 
consider an electrical circuit composed of a voltage source, a capacitor, and a fluo- 
rescent light. Under the right conditions, the voltage will oscillate through time, 
and these changes can be described by simple laws. Are these oscillations his- 
torical? In Eldredge’s view they are; the circuit has a history, a quite boring one, 
but a history nonetheless. Yet such a system does not generate unique and 
contingent trajectories. After the system settles down, one oscillation is just like 
the previous one, and the period and amplitude of the oscillations are not 
contingent on initial conditions. They are not historical in the sense that “one 
damn thing after another” (Elton 1967:40] leads to cumulative, but unpredict- 
able change. 

What then makes change historical? We think that two requirements cap- 
ture much of what is meant by “history.” These two requirements pose a more 
interesting and serious challenge for reconciling history with a scientific approach 
to explanation. Consider a system like a society or a population that changes 
through time both under the influence of internal dynamics and exogenous 
shocks. Then we suggest that the pattern of change is historical if the following 
statements apply. 

1 . Trajectories are not stationary on the time scales of interest. History is more 
than just change — it is change that does not repeat itself. On long enough 
timescales, the oscillations in the circuit become stationary, meaning that the 
chance of finding the system in any particular state becomes constant. Similarly, 
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random day-to-day fluctuations in the weather do not constitute historical 
change if one is interested in organic evolution because, on long evolutionary 
timescales there will be so many days of rain, so many days of sun, and so on. By 
choosing a suitably long period of time, we can construct a scientific theory of 
stationary processes using a statistical rather than strictly deterministic approach. 
In the case of nonstationary historical trajectories, a society or biotic lineage 
tends to become gradually more and more different as time goes by. There is no 
possibility of basing explanation on, say, a long-run mean about which the his- 
torical entity fluctuates in some at least statistically predictable way, because the 
mean calculated over longer and longer runs of data continues to change sig- 
nificantly. One of the most characteristic statistical signatures of nonstationary 
processes is that the variance they produce grows with time rather than con- 
verging on a finite value. Note that a process that is historical in one spatio- 
temporal frame may not be in another. If we are not too interested in a specific 
species or societies in given time periods, we can often average over longer 
periods of time or many historical units to extract ahistorical generalizations. 
Any given water molecule has a history, but it is easy to average over many of 
them and ignore this fact. 

2. Similar initial conditions give rise to qualitatively different trajectories. His- 
torical change is strongly influenced by happenstance. This requires that the 
dynamics of the system must be path-dependent; isolated populations or soci- 
eties must tend to diverge even when they start from the same initial condition 
and evolve in similar environments. Thus, for example, the spread of a favored 
allele in a series of large populations is not historical. Once the allele becomes 
sufficiently common, it will increase at first exponentially and then slowly, as- 
ymptotically approaching fixation. Small changes in the initial frequencies, pop- 
ulation size, or even degree of dominance will not lead to qualitative changes in 
this pattern. In separate but similar environments, populations will converge on 
the favored allele. Examples of convergence in similar environments are common — 
witness the general similarity in tropical forest trees and many of the behaviors 
of the long fallow cultivators who live among them the world over. On the other 
hand, there are also striking failures of convergence — witness the many unique 
features of Australian plants, animals, and human cultures. The peculiar hanging 
leaves of eucalypts, the bipedal gait of kangaroos, and the gerontocratic structure 
of Australian aboriginal societies make them distinctively different from the 
inhabitants of similar temperate and subtropical dry environments on other 
continents. 

It is important not to blur the distinction between simple trajectories and 
true historical change; it is easy to see how evolutionary processes like natural 
selection give rise to simple, regular change like the spread of a favored allele or 
subsistence practice. However, it is not so easy to see how such processes give 
rise to unique, contingent pathways. Scientists tout the approach to steady states 
and convergence in similar situations as evidence for the operation of natural 
“laws,” so it seems natural to conclude that a lack of stationarity and conver- 
gence are evidence of processes that cannot be subsumed in the standard con- 
ceptions of science. Our argument is that things are not at all that simple. There 
is every reason to expect that perfectly ordinary scientific processes, ordinary in 
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the sense that they result from natural causes and are easily understood by 
conventional methods, regularly generate history in the sense defined by these 
two criteria. 


How Do Adaptive Processes Give Rise to History? 

Let us begin with the two most straightforward answers to this question. First, 
it could be that most evolutionary change is random. Much change in organic 
evolution may be the result of drift and mutation, and much change in cultural 
evolution may result from analogous processes. The fact that drift is a very slow 
process would explain long-term evolutionary trends. Raup (1977} and others 
argue that random-walk models produce phylogenies that are remarkably similar 
to real ones. The fact that cultural and genetic evolutionary change is random 
would allow populations in similar environments to diverge from each other. It 
seems likely that some variation in both cases evolves mainly under the influence 
of nonadaptive forces — for example, much of the eukaryotic genome does not 
seem to be expressed and evolves under the influence of drift and mutation 
(Futuyma, 1986:447}. Similarly, the arbitrary character of symbolic variation 
suggests that nonadaptive processes are likely to be important in linguistic 
change and similar aspects of culture. In both cases, isolated populations diverge 
at an approximately constant rate on the average. However, to understand why a 
particular species is characterized by a particular DNA sequence, or why a par- 
ticular people use a particular word for mother, one must investigate the se- 
quence of historical events that led to the current state. 

It is also possible that historical change is generated by abiotic environmental 
factors (V alentine and Moores, 1972}. Long-term trends in evolution could result 
from the accurate adaptive tracking of a slowly changing environment. For ex- 
ample, during the last hundred million years, there has been a long-term increase 
in the degree of armoring of many marine invertebrates living on rocky substrates 
and a parallel increase in the size and strength of feeding organs among their 
predators (Vermeij, 1987; Jackson, 1988}. It is possible that these biotic trends 
have been caused by long-run environmental changes over the same period — for 
example, an increase in the oxygen content of the atmosphere (Holland, 1984}. 
Similarly, beginning perhaps as much as 17,000 years ago, humans began a shift 
from migratory big game hunting to sedentary, broad-spectrum, more labor- 
intensive foraging, finally developing agriculture about 7,000 years ago (Henry, 
1989}. Many authors (e.g., Reed, 1977} have argued that the transition from 
glacial to interglacial climate that occurred during the same period is somehow 
responsible for this change. Similarly, differences among populations in similar 
environments may result from the environments actually being different in some 
subtle but important way. For example, Westoby (1989} has argued that some 
of the unusual features of the Australian biota result from the continent-wide 
predominance of highly weathered, impoverished soils on this relatively undis- 
turbed continental platform. Perhaps the failure of agriculture to develop in or 
diffuse to aboriginal Australia, despite many favorable preconditions and the 
presence of cultivators just across the Torres Strait, also reflects poor soils. 
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It is more difficult to understand how adaptive processes like natural se- 
lection can give rise to historical trajectories. There are two hurdles: first, 
adaptive processes in both organic and cultural evolution appear to work on 
rather short timescales compared to the timescales of change known from the 
fossil record, archaeology, and history. Theory, observation, and experiment 
suggest that natural selection can lead to change that is much more rapid than 
any observed in the fossil record (Levinton, 1988:342-347}. For example, the 
African Great Lakes have been the locus of spectacular adaptive radiations of 
fishes amounting to hundreds of highly divergent forms from a few ancestors in 
the larger lakes (Lowe-McConnell, 1975]. The maximum timescales for these 
radiations, set by the ages of the lakes and not counting that they may have dried 
up during the Pleistocene epoch, are only a few million years. The radiation in 
Lake Victoria (about 200 endemic species] seems to have required only a few 
hundred thousand years. Adaptive cultural change driven by decision-making 
forces can be very fast indeed, as is evidenced by the spread of innovations 
(Rogers, 1983]. It is not immediately clear how very short timescale processes 
such as these can give rise to longer-term change of the kind observed in both 
fossil and archaeological records, unless the pace of change is regulated by envi- 
ronmental change. In the absence of continuing, long-term, nonstationary en- 
vironmental change, adaptive processes seem quite capable of reaching equilibria 
in relatively short order. In other words, both cultural and organic evolution 
seem, at first glance, to be classic scientific processes that produce functional 
adjustments too rapidly to account for the slow historical trajectories we actually 
observe. 

Second, it is not obvious why adaptive processes should be sensitive to initial 
conditions. Within anthropology the view that adaptive processes are ahistorical 
in this sense underpins many critiques of functionalism. Many anthropologists 
claim that it is self-evident that cultural evolution is historical and that, there- 
fore, adaptive explanations (being intrinsically equilibrist and ahistorical] must 
be wrong. For example, Hallpike (1986} presents a variety of data that show that 
peoples living in similar environments often have quite different social organi- 
zation, and historically related cultures often retain similar social organizations 
despite occupying radically different environments. Because functionalist models 
predict a one-to-one relationship between environment and social organization, 
he argues, these data falsify the functionalist view. Indeed, functionalists like 
Cohen (1974:86] expect to see history manifest only in the case of functionally 
equivalent symbolic forms. Biologists have generally been more aware that a 
population’s response to selection depends on phylogenetic and developmental 
constraints and, therefore, that evolutionary trajectories are, at least to a degree, 
path-dependent. Nonetheless, lack of convergence is sometimes used to argue 
the lack of importance of natural selection. Should selection not cause popula- 
tions exposed to similar environments to converge on similar adaptations? 
Certainly, some striking convergences from unlikely ancestors do exist. 

Here we argue that path dependence and long-term change are likely to be 
consequences of any adaptive process analogous to natural selection. Our claims 
are rather general and are thus independent of the nature of the transmission 
process (genetic or cultural] and of the details of development. Let us begin with 
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an especially simple example of genetic evolution. Consider a large population of 
organisms in which individuals’ phenotypes can be represented as a number of 
quantitative characters. Let us assume that there are no constraints on what can 
evolve due to properties of the genetic system itself. One model with this 
property assumes that the distribution of additive genetic values 2 for each char- 
acter is Gaussian, that there are no genetic correlations among characters, that no 
genotype-environment interactions exist, and that mutation maintains a constant 
amount of heritable variation for each character. Further, assume that the fitness 
of each individual depends only on its own phenotype, not on the frequency of 
other phenotypes or the population density, and there is no environmental 
change. With these assumptions, it can be shown that the change in the vector of 
mean values for each character is along the gradient of the logarithm of average 
fitness (Lande, 1979). In other words, the mean phenotype in the population 
changes in the direction that maximizes the increase in the average fitness of the 
population. This is the sort of situation in which selection, and similar processes 
in the cultural system, ought to produce optimal adaptations in the straight- 
forward manner depicted in elementary textbooks. 

In this simple model the evolutionary trajectory of the population will be 
completely governed by the shape of average fitness as a function of mean phe- 
notype. If the adaptive topography has a unique maximum, then every popu- 
lation will evolve to the same equilibrium mean phenotype, independent of its 
starting position, and, once there, be maintained by stabilizing selection. On the 
other hand, if there is more than one local maximum, different equilibrium 
outcomes are possible depending on initial condition. The larger the number of 
local maxima, the more path-dependent the resulting trajectories will be (see 
figure 15.1). 

Unfortunately, we do not know what real adaptive topographies look like, 
and, as Lande (1986) has pointed out, there is little chance that we will be able 
to determine their shape empirically. In evolutionary texts, adaptive topo- 
graphies are commonly depicted as a smooth three-dimensional surface with a 
small number of local maxima. However, if evolutionary “design problems” are 
similar to the engineering ones, this picture is misleading. Experience with en- 
gineering design problems suggests that real adaptive topographies are often 
extremely complex, with long ridges, multiple saddle points, and many local 
optima — more akin to the topographic map of a real mountain range than the 
smooth textbook surfaces. 

A computer design problem discussed by Kirkpatrick, Gelatt, and Vecchi 
(1983) provides an excellent example. Computers are constructed from large 
numbers of interconnected circuits, each with some logical function. Because the 
size of chips is limited, circuits must be divided among different chips. Because 
signals between chips travel more slowly and require more power than signals 
within chips, designers want to apportion circuits among chips so as to minimize 
the number of connections between them. For even moderate numbers of circuits, 
there is an astronomical number of solutions to this problem. Kirkpatrick et al. 
present an example in which the 5,000 circuits that make up the IBM 370 mi- 
croprocessor were to be divided between two chips. Here there are about 10 1503 
possible solutions] This design problem has two important qualitative properties: 
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Figure 15 . 1 . This figure shows two adaptive topographies. The axes are the mean 
genetic value in a population for two characters. The contour lines give contours of 
equal mean fitness. Populations beginning at different initial states all achieve the same 
equilibrium state. Figure 15.1a shows a simple unimodal adaptive topography. 

Figure 15.1b shows a complex, multimodal topography. Initially similar populations 
diverge owing only to the influence of selection. 


1 . It has a very large number of local optima. That is, there is a large number of 
arrangements of circuits with the property that any simple rearrangement in- 
creases the number of connections between chips. This means that any search 
process that simply goes uphill [like our model of genetic evolution] can end up 
at any one of a very large number of configurations. An unsophisticated opti- 
mizing scheme will improve the design only until it reaches one of the many 
local optima, which one depending upon starting conditions. For example, for 
the 370 design problem, several runs of a simple hill-climbing algorithm pro- 
duced between 677 and 730 interconnections. The best design found [using a 
more sophisticated algorithm] required only 1 83 connections. 

2. There is a smaller, although still substantial, number of arrangements with 
close to the optimal number of interconnections. That is, there are many qualita- 
tively different designs that have close to the best payoff. In the numerical 
example there are on the order of sqrt[5,000] « 70 such arrangements. 
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Figure 15.1 (continued) 


Kirkpatrick et al. (1983) show that two other computer design problems, the 
arrangement of chips on circuit boards and the routing of wiring among chips, have 
similar properties. These three computer design problems are not unlike evolu- 
tionary “design” problems in biology — the localization of functions in organs, the 
arrangement of organs in a body, and the routing of the nervous and circulatory 
networks — that are likely to generate complex adaptive topographies. Moreover, as 
anyone experienced with the numerical solution of real-world optimization prob- 
lems will testify, these results are quite typical. To quote from the introduction of 
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a recent textbook on optimization, “many common design problems, from re- 
servoirs to refrigerators, have multiple local optima, as well as false optima, that 
make conventional [meaning iterative hill-climbing] optimization schemes risky” 
(Wilde, 1978]. Thus, if the analogy is correct, small differences in initial conditions 
may launch different populations on different evolutionary trajectories, which end 
with qualitatively different equilibrium phenotypes. 

It is important to see that this history-generating property does not depend 
on the existence of genetic or developmental constraints. At least as defined in 
Maynard Smith et al. (1985) there are no genetic or developmental constraints in 
the simple model of selection acting on a complex topography. Every combi- 
nation of phenotypes can be achieved, and there is no bias in the production of 
genetic variation. Path dependence results from the facts that different char- 
acters interact in a complex way to generate fitness and that the direction of 
natural selection depends on the shape of the local topography. 

Of course, developmental constraints could also play a major role in con- 
fining lineages to historically determined bauplans, as many biologists have ar- 
gued (e.g., Seilacher, 1970). Further, complex topographies and developmental 
constraints may be related. Wagner (1988) hypothesizes, based on a model of 
multivariate phenotypic evolution, that fitness functions will generally be “ma- 
lignant” and that developmental constraints act to make phenotypes more re- 
sponsive to selection. By malignant, Wagner means that the fitness of any one 
trait is likely to depend on the values of many other traits. For example, larger 
size may be favored by selection for success in contests for mates but only if 
many traits of the respiratory, skeletal, and circulatory systems change in concert 
to support larger size. If phenotype is unconstrained, response to selection will 
be slow because of the need to change so many independent characters at once, 
whereas developmental constraints confine the expression of variation to a few 
axes that can respond rapidly to selection. Thus, the bill is a simple, rather 
constrained part of the anatomy of birds, yet selection has remodeled bills along 
the relatively few dimensions available (length, width, depth, curvature) to 
support an amazing variety of specializations. Developmental constraints may 
be a solution to the complexity of adaptive topographies, albeit one that limits 
lineages to elaborating a small set of historically derived basic traits as they 
respond to new adaptive challenges. 

Path dependence can arise from the action of functional processes in a 
cultural system of inheritance as well. For example, decision-making forces arise 
when people modify culturally acquired beliefs in the attempt to satisfy some 
goal. If people within a culture share the same goal, this process will produce an 
evolutionary trajectory very similar to one produced by natural selection — the 
rate of change of the distribution of beliefs in a population will depend on the 
amount of cultural variation and the shape of an analog of the adaptive topog- 
raphy in which fitness is replaced by utility (the extent to which alternative 
beliefs satisfy the goal) (Boyd and Richerson, 1985, ch. 5). The details of the 
transmission and selective processes are not crucial, as long as the processes that 
lead to change can be represented as climbing a complex topography. 

It is unclear whether adding genetic constraints will increase or decrease the 
potential for path dependence. One sort of genetic constraint can be added by 
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allowing significant genetic correlations among characters (Lande, 1986]. This 
assumption means that some mutants are more probable than others. As long as 
there is some genetic variation in each dimension, the vector of phenotypic 
means will still go uphill but not necessarily in the steepest direction. The po- 
pulation will come to equilibrium at one of the local peaks, although this might 
be quite distant from the equilibrium that the population would have reached 
had there been no genetic correlations (Lande, 1979, 1986]. More generally, 
most genetic architectures do not result in Gaussian distributions of genetic 
values (Turelli and Barton, 1990], and analyses of two locus models suggest that 
dynamics resulting from the combination of linkage and selection may create 
many locally stable equilibria even when the fitness function is unimodal (Karlin 
and Feldman, 1970]. This suggests that adding more genetic realism would in- 
crease the potential for path dependence. On the other hand, computer scientists 
(Holland, 1975; Brady, 1985] have found that optimization algorithms closely 
modeled on multilocus selection are less likely to get stuck on local optima than 
simple iterative hill-climbing algorithms. The issue of genetic constraints is still 
open. 

The situation in cultural evolution is similar, even if not so well studied. On 
the one hand, many anthropologists stress the rich structure of culture. To the 
extent that such structure exists, path dependence is likely to be important. On 
the other hand, Bandura (1977], a pioneering student of the processes of social 
learning, argues that there is relatively little complex structuring of socially 
learned behavior. The many examples of cultural syncretism and diffusion of 
isolated elements of technology suggest his view ought to be taken seriously. 
Perhaps complex structure is most important in the symbolic aspects of culture, 
but symbolic variation may be only weakly constrained by functional con- 
siderations (Cohen, 1974], According to Cohen, we have to use purely contin- 
gent historical explanations for things such as linguistic variation, while simple 
functional explanations suffice for economic, political, and social-organizational 
phenomena. 

Long-term nonstationary trends in evolution can result if there is some 
process that causes populations to shift from one peak to another and if that 
process acts on a longer time scale than adaptive processes like natural selection. 
So far we have assumed that populations are large and the environment is un- 
changing. With these assumptions, populations will usually rapidly reach an 
adaptive peak and then stay there indefinitely. They will not exhibit the kind of 
long-run change that we have required for change to be historical. Wright (e.g., 
1977) long argued that drift plays an important role in causing populations to 
shift from peak to peak, and then competition among populations favors the 
population on the higher peak. Chance variations in gene frequency in small 
populations could lead to the occasional crossing of adaptive valleys and the 
movement to higher peaks. Recently, several authors have considered mathe- 
matical models of this process (Barton and Charlesworth, 1984; Newman, Co- 
hen, andKipnis, 1985; Lande, 1986; Crow, Engels, and Denniston, 1990). These 
studies suggest that the probability that a shift to a new peak will occur during 
any time period is low; however, when a shift does occur, it occurs very rapidly. 
If this view is correct, drift should generate a long-run pattern of change in which 
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populations wander haltingly up the adaptive topography from lower local peaks 
to higher ones. It is also implausible that environments remain constant either in 
space or in time. As environments change, the shape of the adaptive topography 
shifts, causing peaks to merge, split, disappear, or temporary ridges to appear, 
connecting a lower peak to a higher one. Thus, populations will occasionally 
slide from one peak to another. As long as such events are not too common, 
environmental change will also lead to long-run change. Such change might 
appear gradual if there are many small valleys to cross or punctuational if there 
are a few big ones. 

Adding social or ecological realism to the basic adaptive hill-climbing model 
of evolution probably increases the potential for multiple stable equilibria. In the 
simple model, an individual’s fitness depended only on his phenotype. When 
there are social or ecological interactions among individuals within a population, 
individual fitness will depend on the composition of the population as a whole. 
When this is the case, evolutionary dynamics can no longer be represented in 
terms of an invariant adaptive topography. However, they may still be charac- 
terized by multiple stable equilibria. Moreover, the fact that many quite simple 
models of frequency dependence have this property suggests that frequency 
dependence may usually increase the potential for path-dependent historical 
change. 

Models of the evolution of norms provide an interesting example of how 
frequency dependence can multiply the number of stable equilibria. Hirshleifer 
and Rasmusen (1989) have analyzed a model in which a group of individuals 
interact over a period of time. During each interaction, individuals first have the 
opportunity to cooperate and thereby produce a benefit to the group as a whole 
but at some cost to themselves; they then have a chance to punish defectors at no 
cost to themselves. These authors show that strategies in which individuals co- 
operate, and punish noncooperators and nonpunishers, are stable in the game- 
theoretic sense. However, they also show that punishment strategies of this kind 
can stabilize any behavior — cooperation, noncooperation, wearing white socks, 
or anything else. We (chapter 9) show that the same conclusions apply in an 
evolutionary model even when punishment is costly. This form of social norm 
can stabilize virtually any form of behavior as long as the fitness cost of the 
behavior is small compared to the costs of being punished. 

More generally, coordination is an important aspect of several kinds of social 
interactions (Sugden, 1986). In a pure game of coordination, it does not matter 
what strategy is used, as long as it is the strategy that is locally common. Driving 
on the left versus right side of the road is an example. It does not matter which 
side we use, but it is critical that we agree on one side or the other. This property 
of arbitrary advantage to the common strategy is shared by many symbolic and 
communication systems and allows multiple equilibria whenever there are mul- 
tiple conceivable strategies. In many other common kinds of social interactions, 
elements of coordination and conflict are mixed. In such games, all individuals 
are better off if they use the same strategy, even though the relative advantages 
of using the strategy differ greatly from individual to individual, and some in- 
dividuals would be much better off if another strategy were common. As long as 
the coordination aspect of such interactions is strong enough, multiple stable 
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equilibria will exist. Arthur (1990) shows how locational decisions of industrial 
enterprises could give rise to historical patterns due to coordination effects. It is 
often advantageous for firms to locate near other firms in the same industry 
because specialized labor and suppliers have been attracted by preexisting firms. 
The chance decisions of the first few firms in an emerging industry can establish 
one as opposed to another area as the Silicon Valley of that industry. More 
generally, historical patterns can arise in the many situations where there are 
increasing returns to scale in the production of a given product or technology. 
Merely because the “qwerty” keyboard is common, it is sensible to adopt it 
despite its inefficiencies. 

Interactions between populations and societies (or elements within societies 
such as classes) can give rise to multiple stable equilibria. Models of the co- 
evolution of multiple populations have many of the same properties as fre- 
quency- and density-dependent selection within populations, although the 
theory is less well developed (Slatkin and Maynard Smith, 1979). The evolution 
of one population or society depends upon the properties of others that interact 
with it, and many different systems of adjusting the relationships between the 
populations may be possible. For example, Cody (1974:201) noted that com- 
peting birds replace each other along an altitudinal gradient in California but 
latitudinally in Chile. Given the rather similar environments of these two places, 
it is plausible that both systems of competitive replacement are stable and which 
one occurs is due to accidents of history. 

The stratification of human societies into privileged elites and disadvantaged 
commoners derives from the ability of elites to control high-quality resources 
or to exploit commoners using strategies that are similar to competitive and 
predatory strategies in nature. Insko et al. (1983) studied the evolution of social 
stratification in the social psychology laboratory. They showed that elites could 
arise in both an experimental condition that mimicked freely chosen trade re- 
lations and one that mimicked conquest. Elites were approximately as well off in 
both conditions and, insofar as they controlled things, would have no motivation 
to change social arrangements. It seems plausible that the diversity of political 
forms of complex societies could result from many arrangements of relations 
between constituent interest groups being locally stable. The distinctive differ- 
ences between the Japanese, American, and Scandinavian strategies for operating 
technologically advanced societies could well derive from historic differences in 
social organization that have led to different, stable arrangements between in- 
terest groups, in spite of similar revolutionary changes in production techniques 
of the last century or two. 

Social or ecological interactions may also give rise to dynamic processes that 
are sensitive to initial conditions and have no stable equilibria. Lande (1981) 
analyzed a model of one such process, sexual selection in which females have a 
heritable preference for mates that is based on a heritable, sex-limited male 
character. According to his model, when the male character and female pre- 
ferences are sufficiently correlated genetically, female choice can create a self- 
reinforcing “runaway” process that causes the mean male character and the mean 
female preference either to increase or decrease indefinitely, even in the presence 
of stabilizing selection on the male character. Selection cannot favor female 
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variants that choose fitter males (in the usual sense of fitter) because most females 
are choosing mates with an exaggerated character. The “sensible” female's sons 
will be handicapped in the mating game. The direction that evolution takes 
depends on the details of the initial conditions in Lande’s model. His quantitative 
character will be elaborated in one direction or the other depending on how 
evolution drifts away from an unstable line of equilibria. Although the inter- 
pretation of this model is controversial, it is easy to imagine that the exaggerated 
characters of polygynous animals like birds of paradise and peacocks result from 
the runaway process. We (Boyd and Richerson, 1985, ch. 8, 1987; Richerson and 


Microevolutionary Processes 




Figure 15.2. Both parts show the trajectories of population growth generated by the 
same model of social evolution for two slightly different initial population sizes. 

In 15.2a the society goes through three distinct phases of growth, while in 15.2b, there are 
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Boyd, 1989a] have argued that quite similar processes may arise in cultural 
evolution when individuals are predisposed to imitate some individuals on the 
basis of culturally heritable characteristics. The use of some character associated 
with prestige (stylish dress, for example) as an index of whom to imitate has the 
same potentially unstable runaway dynamic as Lande’s model of mate choice 
sexual selection, and even casual observation suggests that prestige systems do 
follow contingent historical trajectories. Fashions in clothing, for example, evolve 
in different directions in different societies, often without much regard for 
practicality. 

Perhaps the most clearly historical patterns of change result when social or 
ecological interaction leads to “chaotic” dynamics. For example, Day and Walter 
(1989) have analyzed an extremely interesting model of social evolution in 
which population growth leads to reduced productivity, social stratification, and 
eventually to a shift from one subsistence technology to a more productive one. 
The resulting trajectories of population size are shown in figure 15.2. Population 
grows, is limited by resource constraints, and eventually technical substitu- 
tion occurs, allowing population to grow once more. The only difference be- 
tween figure 15.2a and 15.2b is a very small difference in initial population size. 
Nonetheless, this seemingly insignificant difference leads to qualitatively differ- 
ent trajectories — one society shows three separate evolutionary stages, and the 
second only two. 


Conclusion 

Scientific and historical explanations are not alternatives. Contingent, diverging 
pathways of evolution and long-term secular trends can result from processes 
that differ only slightly from those that produce rapid, ahistorical convergence to 
universal equilibria. Late nineteenth- and early twentieth-century scientists gave 
up restricting the term “scientific” for deterministic, mechanistic explanations 
and began to admit “merely” statistical laws into the fundamental corpus of 
physics (very reluctantly in some cases — recall Einstein’s famous complaint 
about God not playing dice with the universe to express his distaste for quantum 
mechanics). Similarly, historical explanations cannot be distinguished from other 
kinds of scientific explanations except that some models (and, presumably, the 
phenomena they represent) generate trajectories that meet our definition of 
being historical. These history-generating processes do not depend on exotic 
forces or immaterial causes that ought to excite a scientist’s skepticism; perfectly 
mundane things will do. There are challenging complexities in historical pro- 
cesses. For example, even well-understood processes will not allow precise 
predictions of future behavior when change is historical. However, all the tools 
of conventional scientific methods can be brought to bear on them. For example, 
it should be possible to use measurement or experiment to determine if a pro- 
cess is in a region of parameter values where chaotic behavior is expected. At 
the same time, the historian’s traditional concern for critically dissecting 
the contingencies that contribute to each unique historical path is well taken. 
Process-oriented “scientific” analyses help us understand how history works, and 
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''historical” data are essential to test scientific hypotheses about how popula- 
tions and societies change. 

In the biological and social domains, “science” without “history” leaves 
many interesting phenomena unexplained, while “history” without “science” 
cannot produce an explanatory account of the past, only a listing of disconnected 
facts. The generalizing impulses of science require historical methods, because 
the phenomena to be understood are genuinely historical and because historical 
data are essential for developing generalizations about evolutionary processes. In 
return, generalizations derived from history and by the study of contemporary 
systems would seem to be essential for an understanding of particular cases. The 
amount of data available from the past is usually very limited, and the number of 
possible reconstructions of the past is correspondingly large. Some sort of theory 
has to be applied to make some sense of the isolated facts. Historians (e.g., 
Braudel, 1972) and paleontologists (e.g., Valentine, 1973) often cast their nets 
rather widely in search of help in interpreting the documents and fossils. McNeill 
(1986) advocates a “scientific,” generalization-seeking approach to history much 
in this spirit. Consider the question of which of the potential history-producing 
processes we have discussed are most important in explaining the changes in 
human societies over the last few tens of thousands of years. Generalizing dis- 
ciplines such as climatology and cultural ecology are certainly relevant to the task 
in general and to the understanding of how particular societies changed in par- 
ticular environments (Henry, 1989). At the same time, because these historical 
societies faced Pleistocene climates and the transition to the Holocene, and 
because they developed a series of technical, social, and ideological innovations 
that are the foundation of modern human societies by processes that are not 
open to direct observation, the historical and archaeological records provide 
crucial data not available from ahistorical study. To the extent that the processes 
we have described are important, “science” and “history” cannot be disen- 
tangled as separate intellectual enterprises. 

Darwinian models of organic and cultural evolution illustrate how little 
distinction can be made between the two approaches. Such models can produce 
historical patterns of change by a rather large number of different mechanisms. 
We have argued that historical change is distinguished by two attributes: the 
tendency of initially similar systems to diverge and the occurrence of long-term 
change. Evolutionary models, including those that assume that selection or 
analogous cultural processes increase adaptiveness in each generation, readily 
generate multiple stable equilibria. Populations with similar initial conditions 
may evolve toward separate equilibria. Random genetic drift and analogous cul- 
tural processes, coupled with environmental change, may cause populations to 
shift from one equilibrium to another. It is plausible that peak shifting by pop- 
ulations (or the shifting of peaks due to environmental change) occurs at a slow 
enough rate to explain long-term secular trends. 

Many anthropologists take as their task the explanation of differences among 
human societies and suggest that most such differences are historical in char- 
acter. If explanation of such variation is mainly historical, then anthropologists 
might reasonably ask, what is the point of Darwinian models of cultural change 
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when historical or “contextual” explanations will be much more productive? 
The reasons are as follows. 

First, the premise is often incorrect. Genuine convergences are common and 
explaining them requires some theory based on common processes of cultural 
change. Perhaps the most spectacular cultural example is the convergence of 
social organization in stratified, state-level societies in the Old and New Worlds. 
For example, Cortez in 1 5 1 9 found that Aztec society was quite similar to his own 
in important ways: it contained familiar roles, hereditary nobility, priests, war- 
riors, craftsmen, peasants, and so on. The bureaucracy was organized hierar- 
chically. This convergence is remarkable because the Spanish and Aztec states 
evolved independently from a hunter-gatherer ancestry. The cultural lineages 
that resulted in these two states were without known cultural contact for several 
thousand years before state formation began in either [Wenke, 1980). 

Second, Darwinian models can make useful predictions. They can tell us 
why some forms of behavior or social organization are never observed and others 
are common. For example, kinship is an extremely common principle of social 
organization. Contrarily, there would seem to be lots of advantages to a free 
market in babies — for the individual, it would allow easy adjustment of family 
size, age composition, sex ratio, and so on, and for society, a division of labor in 
child rearing would allow better use of human resources. The sociobiological 
theory of kin selection explains why there are no societies with free trade in 
infants and why kinship is generally an important feature of social organization. 
If most of the historic context is taken as given, Darwinian arguments can be 
very powerful heuristics. This is especially clear for genetic evolution. For ex- 
ample, given haplodiploidy, a theory based directly on the expected equilibrium 
outcome of natural selection can make surprising and extremely fruitful pre- 
dictions about patterns of behavior in social insects. Who, for example, would 
have thought to connect sex ratio among reproductives and “slave making” in 
ant species? In recent years, similar ideas have been usefully applied to under- 
standing human behavior. For example, Hill, Kaplan, and their colleagues (re- 
viewed in Hill and Kaplan, 1988) have used theory from behavioral ecology to 
relate patterns of foraging, mate preference, and child care among Ache hunter- 
gatherers, and Borgerhoff-Mulder (1988) has explained variation in bride price 
among Kipsigis pastoralists in terms of parameters that predict future female 
fitness. 

Finally, it is useful in and of itself to know that even the most strongly 
functional Darwinian models can give rise to historical change. The same pro- 
cesses that give rise to convergence in one case can generate differences in an- 
other, given only small changes in the structure of the process or in initial 
conditions. Brandon (1990) argues that “why possibly” explanations are useful 
in evolutionary biology. By this, he means explanations that tell us how some 
character could have evolved are useful even if we cannot determine whether the 
explanation is true. The theoretical models in population genetics provide a good 
example: Hamilton’s (1964) kin selection models show how natural selection 
could give rise to self-sacrificial behavior. However, we usually do not know 
whether any particular case of altruism arose as a result of kin selection. The lack 



306 archaeology AND CULTURE HISTOR' 


of any “why possibly” explanation would cast doubt on other aspects of our 
knowledge of how selection shapes behavior. 

Understanding how adaptive processes could give rise to historical change 
is useful for analogous reasons. There is considerable evidence that people’s 
choices about what to believe and what to value are affected by the consequences 
in material well-being, social status, and so on (e.g., Boyd and Richerson, 1985], 
This view has a venerable history in anthropology (e.g., Barth, 1981; Harris, 
1979], plays a foundation role in economics, and is taken for granted in many 
historians’ explanations for particular sequences of events. If cultural change is 
affected by consequence-driven individual choice or natural selection, then it 
follows that there will be a process that will act to modify the distribution of 
cultural variation in a population in much the same way that natural selection 
changes genetic variation (Boyd and Richerson, 1985, chs. 4 and 5). The fact that 
functional processes like natural selection readily lead to history allows one to hold 
this view without having necessarily to search for external environmental dif- 
ferences to explain the differences among apparently similarly situated human 
societies. 


NOTES 

We thank James Griesemer, Matthew H. Nitecki, Eric A. Smith, and two anony- 
mous reviewers for most helpful comments on previous drafts of this chapter. 

1. This project is quite different from the better-known, classical studies of 
cultural evolution developed by Leslie White (1959) and other scholars in anthro- 
pology. This work focused descriptively on the large-scale patterns of cultural evo- 
lution rather than on the details of the processes by which cultural evolution occurs 
(Campbell, 1965, 1975). The research tradition White represents derives from the 
progressivist ideas of Herbert Spencer, rather than from Darwin. 

2. The additive genetic value of a particular individual for a particular character 
is the average value of that character for offspring produced when that individual 
mates at random with a large number of other individuals in the population. For 
example, the additive genetic value of a bull for fat content is the average fat content 
of all its offspring where mates were chosen at random. The distribution of genetic 
values is Gaussian when the probability that an individual has a given genetic value is 
given by the normal (or Gaussian) probability distribution. Genetic correlations exist 
when the distributions of genetic values for different characters are not probabilis- 
tically independent. For example, if bulls whose genetic value for size also tend to 
have a higher genetic value for fat content, then body size and fat content are ge- 
netically correlated. Genotype environment correlations arise when individuals with 
the same genotype develop different phenotypes in different environments. 
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1 6 Are Cultural Phylogenies 
Possible? 

With Monique Borgerhoff Mulder and 
William H. Durham 


Biology and the social sciences share an interest in phylogeny. 
Biologists know that living species are descended from past species and use the 
pattern of similarities among living species to reconstruct the history of phylo- 
genetic branching. Social scientists know that the beliefs, values, practices, and 
artifacts that characterize contemporary societies are descended from past so- 
cieties, and some social science disciplines [e.g., linguistics and cross-cultural 
anthropology] have made use of observed similarities to reconstruct cultural 
histories. Darwin appreciated that his theory of descent, with modification, had 
many similarities of pattern and process to the already well-developed field of 
historical linguistics. In many other areas of social science, however, phyloge- 
netic reconstruction has not played a central role. 

Phylogenetic reconstruction plays three important roles in biology. First, it 
provides the basis for the classification. Entities descended from a common 
ancestor share novel, or derived, characters inherited from that ancestor. There- 
fore, it is possible to group them into hierarchically organized series of groups — 
species, genus, family, order, and so on in the biological case. 

Second, knowledge of phylogeny often allows inferences about history. 
The knowledge that humans are more closely related to chimpanzees and gorillas 
than to orangutans provides evidence that the human lineage arose in Africa. 
Phylogenetic reconstructions based on the characters of extant species or cul- 
tures often allow us to reconstruct the history in the absence of a historical, ar- 
chaeological, or fossil record. In practice, the history of many biological 
and cultural groups is so poorly known that only by combining phylogenetic 
and historical or archaeological information can reliable reconstructions be 
obtained. 
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Third, entities descended from a common ancestor share features that may 
constrain the pathways that more recent evolution has followed. For example, se- 
lection for terrestrial locomotion may lead to quadrupedal locomotion in a small 
monkey that runs along the tops of branches but to bipedal locomotion in a large 
arboreal ape that swings below branches (Foley, 1987). The latter pattern allows the 
hand to specialize in manipulative tasks and, on many accounts, is why the ape, but 
not the monkey lineage, eventually was able to produce a cultural species. 

The importance of descent is the crux of some of the deepest controversies 
of all the historical sciences. Some social scientists and biologists (e.g., Boyd and 
Richerson, 1992; Flallpike, 1986; Sahlins, 1976) have argued that history strongly 
constrains adaptation and, as a result, strictly limits adaptive interpretations of 
current behavior. As Francis Galton taught both biologists and social scientists in 
the nineteenth century, to account for the effects of common ancestry, the study 
of adaptation or function requires that patterns of descent be known. Our in- 
ability to provide appropriate roles for history and function is a chronic source of 
controversy. 

If the analogy is real, an interdisciplinary exchange of concepts and tools 
could pay great dividends. Social scientists may be particularly interested in the 
near-revolutionary developments in systematics (Ridley, 1986) and compara- 
tive methods (Harvey and Pagel, 1991) developed by evolutionary biologists in 
the last two decades. 

The purpose of this chapter is to examine the role of descent in culture 
evolution theory. We believe that the critical question is whether human cul- 
tures, or parts of them, are isolated from one another to the same degree as 
biological entities like species and genes. Cultures are frequently characterized 
by sharp ingroup-outgroup boundaries (LeVine and Campbell, 1972) that may 
function to limit the flow of ideas from one population to another (Boyd and 
Richerson, 1987). However, there are also many examples of the diffusion of 
cultural traits across such boundaries (Rogers, 1983). Are the isolating processes 
sufficiently strong to provide at least a core of important cultural traits that are 
sufficiently protected from diffusion so that phylogenetic analysis is possible? 
If so, concepts and methods from biological systematics can be used to reconstruct 
the history of cultures. If not, human cultures are more like subspecies or local 
populations linked by gene flow than like reproductively isolated species. In this 
case, it may be useful to make separate phylogenies for each subunit of culture 
that is substantially protected from diffusion, in much the same way that modern 
molecular procedures are used to reconstruct the phylogeny of subgenomic 
units, especially individual genes. It may also be that there are no cultural units 
with sufficient coherence and therefore that phylogenetic methods are useless. 

We begin by reviewing the notions of descent used in evolutionary biology. 
Biologists have been making use of the concept of descent ever since Darwin, and 
they have developed a sophisticated appreciation for the concept and its problems 
that may be helpful in the human case. The complexity and diversity of biological 
systems of inheritance is wondrous to those brought up on the simple Mendelism 
of 20 years ago (Falk and Jablonka, 1997). Although it is likely that the process 
of cultural descent with modification is different from the analogous process in 
organic evolution, we believe that much can be learned from biologists’ century 
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of hard work. We then consider data from the social sciences that indicate the 
extent to which cultures form bounded wholes, analogous to species. Finally, we 
consider how the descent concepts, partly borrowed from biology, might be used 
to tackle important questions in the social sciences. 


Descent in Organic Evolution 

In biology, two different entities exhibit the clear patterns of descent with 
modification. The most familiar example is the species. The collection of in- 
dividuals who make up a species during any generation is descended, and per- 
haps slightly modified, from the collection of individuals who made up the 
species during the previous generation. A new species is formed by the splitting 
of an existing species. Then each of the daughter species is descended from the 
single ancestral species that gave rise to them. 

Much the same holds for genes one by one. Because genes result from the 
copying of DNA, every gene is descended from the gene that provided its tem- 
plate. Modified genes arise from existing genes by mutation, recombination, and 
gene conversion at a given locus. A genetic locus can give rise to another locus 
by duplicating itself on the chromosome, after which the daughter locus begins 
independent evolution. The relationships among genes is not simply the re- 
lationships among the species that carry them (although this is often the case}. 
We can keep track of the relationship of genes within a single species (e.g., various 
forms of hemoglobin within human populations}. It is also possible to speak of 
relationships among genes that are inconsistent with relationships among species. 
For example, genes for globin molecules in vertebrates and certain plants seem to 
share a more recent common ancestor than the genes in vertebrates and ar- 
thropods, as surprising as this seems at first blush (Jeffreys et ah, 1983}. 

Descent relationships are often represented using branching diagrams like 
that shown in figure 16.1. The diagram conveys the idea that both A and B are 
descended from an ancestor C. (Systematists use similar branching diagrams 
called cladograms to represent patterns of similarity without reference to time, 
or ancestor-descendant relationships; statistical clustering algorithms create 
treelike dendrograms also without any pretense to representing ancestor- 
descendant relationships. Tree diagrams are used here to represent phylogeny.} 
The same diagram is used to represent the relationship among different kinds of 
things. For biologists A , B, and C may represent species or genes. Social scientists 
use similar diagrams to express the relationship among languages, or other as- 
pects of culture, often with the explicit intention of representing a phylogeny. 
What, if anything, do the descents of genes and species have in common? Can 
these commonalities provide some help in analyzing the descent of cultures, 
languages, and technologies? 


The Descent of Genes 

To answer this question, let us begin with the simpler case — the descent rela- 
tionship among genes. If we ignore for a momentthe possibility of recombination, 




every gene is a copy of another gene. Of course, that gene was the copy of yet 
another gene, and so on. Thus, if we pick any two genes, A and B, we can, in 
principle, trace back through a series of copies until we find a gene, C, that 
served as a template for both. We say that genes A and B are descended from C. 
If mutations have occurred, A or B may be different from C and each other. As 
long as mutations are rare and the gene includes enough bases, then genes that 
share more derived mutations are more likely to be related. Taxonomists use this 
fact to reconstruct the branching pattern among genes sampled from living 
species. Notice that there is nothing in the discussion that specifies that C, A, 
and B have to belong to the same (or different] species. The same argument 
would hold regardless of whether A and B are genes found within a single species 
or among distantly related species (e.g., humans and bean plants). 

Units with Reticulated Phytogenies 

Recombination — the shuffling of chromosomes of the genes along a chromo- 
some and the sequence within a gene — complicates matters because it leads to 
what cladists call reticulated phytogenies. Figure 16.2 shows the lineages of three 
genes. Recombination has occurred within the gene three times. After each 
recombination event, each of the daughter genes is a copy of part of each of the 
two parents. The daughter genes are no longer descended from the parental 
genes in the same way that they were in the absence of recombination. They are 
no longer almost exact copies of the parents; rather, they are partial copies of 
both parents. Further recombination events create yet more complicated pat- 
terns of relationship. After some time, every copy of the gene is related to a large 
number of other genes in some complicated way that utterly obscures descent. 
Recombination within a gene is rare, but recombination within chromosomes 
between different genes is quite common. Deep phylogenies can be recon- 
structed for genes, but only shallow ones for chromosomes. 
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AiCDEF€i ABCDtrG ADCDEFG 



abcdepCu aibcgefg adcdefg 



ABCDEPCii ABCDEFG AGCDCrG 



ABCDEFG ASCDEPCi) ABCGtrG 

Figure 16.2. Recombination leads to complicated patterns of descent. Each string of 
letters represents a segment of the chromosome. Each generation each gene is 
replicated, sometimes with recombination. After four generations, each chromosome is 
partly descended from all three of the original chromosomes. 


Gene flow [migration] among subpopulations of a species has a similar 
effect. Any given local group will have acquired genes from many different local 
groups in the past. Even if most subpopulations are created by the subdivision of 
a single parental population, a relatively small rate of individual-level migration 
between subpopulations will carry genes evolved in one daughter subpopulation 
to its sisters. Fairly shortly, descent at the subpopulation level will be impossible 
to detect. Thus, there is a large range of genetic units ranging in size from 
roughly small chromosome segments to the subspecies for which phylogenetic 
analysis is usually impossible. 

Some large gene collections, such as mitochondrial genomes, are protected 
from recombination because they are transmitted asexually. Mitochondrial 
phylogenies of some depth can be constructed, although they illustrate another 
process that eliminates phylogenetic information in the long run. Mitochondria 
are subject to high mutation rates. In a matter of a few million years, every 
descendant pair of mitochondrial genes will have independently mutated more 
than once, and the traces of descent will be lost. Conservative genes like the 
cytochrome genes have slow rates of evolution and can be used to reconstruct 
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phylogenetic relationships reaching back to near the origin of life, but these are 
exceptional. More typically, deep phylogenetic reconstructions based on less 
faithful structures are quite controversial even when we can be almost certain 
that recombination and migration have not confused the picture. 

The Descent of Species 

Species and higher taxa are the classic focus of phylogenetic analysis in biology. 
Linnean systematists formalized the common observation that the organic world 
comes in readily observable clusters. Species and higher taxa seem to be sepa- 
rated by distinctive gaps that do not occur within species or among many other 
natural objects. Darwin’s theory of descent with modification gave a theoretical 
underpinning to the trees of relationships that Linnaeus had enshrined in a 
hierarchical classification system, although Darwin had little to say about the 
species-isolating mechanisms that enforce the gaps between species. His fol- 
lowers have made up for this deficiency; the issue of speciation is a major topic in 
modern evolutionary biology. 

In the basic picture constructed by architects (e.g., like Ernst Mayr) of the 
midcentury neo-Darwinian synthesis, species are created when a barrier to gene 
flow evolves to isolate two sets of populations. Once isolated, the evolution of 
the two new species is independent, and slowly changes accumulate due to 
natural selection, genetic drift, mutation, and so forth. There may be some 
evolutionary differentiation within a population due to selection or drift. But 
interbreeding among populations unites a species, whereas absolute speciating 
barriers definitively separate them from other species. Over the long run, species 
become different enough to be classified as new genera, families, orders, and so 
on, up Linnaeus’s hierarchy. In the classic picture, complete isolation and the 
slow accumulation of differences allow for the reconstruction of relationships of 
descent by splitting over great time depths. 

The basic picture provides a clear causal explanation of the temporal and 
spatial coherence of species. Advocates of the biological species concept hold 
that only when this picture applies do we have species, properly speaking. 
However, several lines of evidence suggest that the absence of gene flow is 
neither necessary nor sufficient for the existence of coherent species in the sense 
of lumpy entities that show clear evidence of descent. Species can maintain their 
coherence without gene flow within the species, and species boundaries may be 
maintained despite gene flow between species. 

Some species have maintained species-typical phenotypes, including the 
ability to form fertile hybrids despite long periods without any gene flow. For 
example, the checkerspot butterfly is found in scattered populations throughout 
California. Members of different populations are very similar morphologically 
and are all classified as members of the species Euphydryas editha. However, 
careful study has shown that there is virtually no gene flow among widely sep- 
arated populations (Ehrlich and Raven, 1969). There are also many examples 
(Levinton, 1988) of cryptic “sibling” species that are long isolated but have 
evolved no detectable morphological differences. Some taxonomists claim that 
it is no more difficult to detect species in asexual organisms than it is in sexual 
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organisms (e.g., Mishler and Brandon, 1987), despite the fact that there is no 
gene flow to unite asexual populations. 

Some species persist despite substantial gene flow (Barton and Hewitt, 
1989). A hybrid zone can exist between what seem to be good species, and often 
a few genes have clearly leaked across the boundary from one species to another. 
It would seem as if such species must either be formerly geographically isolated 
subspecies that will hybridize away or incipient species that will eventually 
evolve an isolating barrier. In fact, active hybrid zones between rather distinct 
species sometimes persist for long periods of time. Selection can apparently 
maintain the coherence of species both without any help from gene flow and in 
the face of substantial amounts of it. 

Things are not always so neat. In bacteria, genes are frequently transmitted 
horizontally among lineages (Eberhardt, 1990). Bacterial DNA exists in two dis- 
tinct forms: most of the DNA is contained in a large chromosome, but about 
1 percent is contained in small loops of DNA called plasmids. The two forms of 
DNA are transmitted differently. For the most part, the chromosomal DNA is 
transmitted vertically. When bacteria divide, the chromosomal DNA is duplicated, 
and each daughter cell contains a copy. In contrast, plasmid DNA is transmitted 
horizontally from one bacteria to another during conjugation. Moreover, bacteria 
that are classified as belonging to different genera or families according to their 
chromosomal DNA readily conjugate and exchange plasmid DNA. As a result, 
genes carried on plasmids may jump from one lineage to another quite distant one. 
It is not certain that the two types of DNA are completely separate. Sometimes 
plasmid DNA may be incorporated into the chromosome, although if this occurs it 
is probably quite rare (Eberhardt, 1990). In the case of bacteria, there are really 
two sets of phylogenies: one for the chromosomal DNA and one for the mito- 
chondrial. Relationships between these phylogenies break down rapidly because of 
the horizontal transmission of plasmids across chromosomal lineages. 

The opposite situation occurs with the lineages of hosts and parasites and 
predators in many animals and plants. For example, ectoparasites like lice and 
fleas are often isolated within their hosts, so that host and parasite phylogenies 
are similar despite no transfer between host and parasite genomes. 


The Common Properties of Genes and Species 

Genes and species are units at quite different levels of organization. For them, but 
not units between them on the scale of organization, deep phylogenies can usually 
be constructed. The reason is a pair of similarities. First, both units are replicated 
with great fidelity and change slowly due to ongoing evolution. Second, when 
daughter genes and species change, these changes are not effectively shared with 
sister lineages by mixing or any other form of communication. For systems with 
high rates of change, like mitochondrial genomes, deeper descent is obscured 
because recently evolved differences completely obliterate the ancient similarities 
that are necessary to detect descent. In the case of units like chromosomes and 
local populations with high rates of mixing, descent is generally untraceable be- 
cause descent-derived differences are erased as rapidly as they arise. 
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Genealogy is by itself not enough to generate much descent. There is a 
hierarchy of genealogical entities in biology: genes, chromosomes, individuals, 
populations, species, and communities. These are genealogical entities because 
they are all descendants of other entities at the same level. In the face of rapid 
mixing or evolution (or both], genealogy alone cannot preserve detectable pat- 
terns of descent, at least not for long. Note that patterns of descent are a matter 
of timescale. If we are interested in relationships over only a few splittings of 
daughter entities, these may be detectable in the face of considerable mixing and 
high rates of evolution. If we want to know relationships traceable many splits 
ago, the criteria are more demanding. 


Reconstructing Cultural Phylogenies 

Can we apply these ideas from biology to the analysis of human culture? Dar- 
winian models of cultural evolution hold that culture is information transmitted 
from individual to individual by imitation, teaching, and other forms of social 
learning. Various processes cause the pool of cultural variants that characterize a 
population’s change through time. 

This view of culture and cultural evolution implies the existence of a hi- 
erarchy of genealogical entities analogous to the genealogical hierarchy of organic 
evolution. We do not know what is the smallest unit of cultural inheritance 
because we do not know in detail how culture is stored in brains. Nevertheless, 
scholars have proposed histories of quite small elements: particular words, 
particular innovations, elements of folk stories, and components of ritual prac- 
tice. Such small elements are linked together in larger, culturally transmitted 
entities: systems of morphology, myth, technology, and religion. Such medium- 
scale units are collected together into ''subcultures” and "cultures” that char- 
acterize human groups of different scales: kin group, village, ethnic group, na- 
tion, and so forth. Cultural subunits sometimes crosscut one another in complex 
ways, as when religion or occupation crosscuts ethnicity (much like bacterial 
chromosomes and plasmids]. 

Four Hypotheses 

Reconstructing cultural phylogenies is possible to the extent that there are ge- 
nealogical entities that have sufficient coherence, relative to the amount of 
mixing and independent evolution among entities, to create recognizable history. 
There is a continuum of possible views about what units in the hierarchy of 
cultural descent satisfy these desiderata. It is useful to identify four regions along 
this continuum. 


Cultures as Species 

Cultures are isolated from one another or are tightly integrated. They contain 
within them powerful sources of isolation (ethnocentric discrimination against 
strangers] or coherence (such as organizing systems of thought that act as biases 
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against ideas one by one, rather than strangers as whole individuals). Both 
mechanisms could cause cultures to act as single entities or “individuals” in the 
course of cultural evolution (see e.g., Marks and Staski, 1988). By one mecha- 
nism or another, there is little cross-cultural borrowing of any significance. New 
cultures are formed completely by the fissioning of populations and subsequent 
divergence. In this case, whole cultures are analogous to species or mitochondrial 
genomes. Biological methods of systematics can be applied almost intact, and 
deep cultural phylogenies are relatively easy to infer for at least the bulk of a 
people’s culture. 


Cultures with Hierarchically Integrated Systems 

Although cross-cultural borrowing may be frequent for many peripheral com- 
ponents, a conservative “core tradition” in each culture is rarely affected by 
diffusion from other groups. New core traditions mainly arise by the fissioning 
of populations and subsequent divergence of daughter cultures. Isolation and 
integration protect the core from the effects of diffusion, although peripheral 
elements are much more heavily subject to cross-cultural borrowing. In this case, 
core traditions are analogous to the bacterial chromosomes and the peripheral 
components to plasmids. Biological methods of systematics can be modified to 
deal with cross-cultural borrowing. Reasonably deep core-cultural phylogenies 
can still be inferred, but this requires disentangling the effects of borrowing by 
distinguishing core and peripheral elements, and especially by methods to iden- 
tify elements that “introgressed” into the core. 


Cultures as Assemblages of Many Coherent Units 

Cultures could be quite ephemeral assemblages of small units, but the latter may 
have limited mixing and slow evolution. Culture may have no species, but it 
might have genes, plasmids, and mitochondria. Different domains may have 
different patterns of inheritance and different evolutionary histories. The com- 
ponents may be fairly large, plasmid, or mitochondrion-like, such as language, or 
small, solitary memes, such as the idea of using a magnetized needle to point 
north. Any given culture is an assemblage of many such units acquired from 
diverse sources. Methods of phylogeny can be applied independently to each 
domain. The essential problem is to determine the boundaries of the domains 
and establish that they are stable in time and space. 


Cultures as Collections of Ephemeral Entities 

There are no observable units of culture that are sufficiently coherent for phy- 
logeny reconstruction to be useful. Observable aspects of culture could be the 
result of units that are beneath the resolution of current methods to observe. The 
forms of Acheulean hand axes are so similar that they cannot be used to infer 
anything about descent among their makers. Perhaps there were really many 
traditional ways to reach this apparently uniform end result. If we knew the 
details, we could reconstruct cultural phylogenies of hand ax making. There may 
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be observable differences, but if they are the product of many recombining 
elements that cannot be observed, there is no information that would allow us 
to construct a phylogeny of the bits. Alternatively, if cultural evolution is suffi- 
ciently rapid, behavior may reflect such recent history that all phylogeny is lost. 
The “jukebox” culture, in which cultures are rapidly modeled and remodeled to 
serve current adaptive purposes, would have this effect due to functional con- 
vergence rapidly destroying any trace of history. 

There are two issues at stake. First, when using the term descent, what do we 
mean? Proponents of the view that whole cultures are like species use descent 
to describe cultural replication of complex coherent groups by the mechanism 
of group fission or budding, whereas those who believe that only components of 
culture cohere would use descent to describe ancestor-descendant relationships 
resulting from any pattern of culture preserving the footprints of its history. We 
shall try to be clear in our own usages, but this is a merely terminological issue to 
which we devote no further space. Second, what is the world like? This is a much 
more interesting question, to which we devote the rest of the chapter. At one end 
of the continuum, all of the elements that make up a culture cohere and resist 
recombination. Cultures as a whole are analogous to species. At the other end, 
the observed elements of culture are the result of memes diffused or invented on a 
timescale too short for phylogenetic reconstruction. What is culture really like? 

Mechanisms 

Several general mechanisms might cause longevity and coherence in cultural 
units so that descent can be determined. 


Longevity of Historical Traces 

As in the case of genes, the phylogenetic process of cultural transmission provides 
some level of historical continuity. As with genes, the deepest phylogenies are 
possible when culture changes slowly and is not subject to functional conver- 
gence. Slow evolution will occur when people either cannot, or have no reason to, 
invent new forms. Surprisingly simple bits of culture are often apparently too 
obscure to reinvent, and all known modern exemplars derive from a single in- 
vention. Needham (1988) gave many plausible examples of Chinese technology 
that subsequently diffused to the rest of Eurasia (e.g., the magnetic compass). 
Nonetheless, in the long run, functional convergence seems to be the rule for 
technology. A long tradition in the social sciences, including the classic cultural 
ecology of Steward (1955) and modern evolutionary anthropology, it trades on 
the reality of substantial convergent evolution in human cultures. As in the bio- 
logical case, the best elements for historical analysis are those that are functionally 
arbitrary and symbolic. Language and other symbolically meaningful, but non- 
functional, variations are often used as indices of descent, much as functionally 
neutral flower form is used in plant systematics. Flowers are a plant's way of 
communicating with pollinators, so the analogy with language is real. 

The next subsection describes some mechanisms that may prevent mixing 
between coherent elements. Similar mechanisms may act to slow the rate of 
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evolution if internal innovations or innovators are perceived as strange, either 
because of a poor internal fit or because they arouse suspicions of heresy or 
deviance on the part of innovators. 


Processes That Give Rise to Coherence 

What general processes could give cultural elements an enduring coherence, 
leaving aside the size of cohering units and their relation to one another? In the 
symbolic and interpretive anthropology literature, the “glue” has been attrib- 
uted to the “meaning” that inheres in culture. Meaningful cultural information 
provides a convincing and compelling Weltanschauung for its bearers. Mean- 
ingful components help organize and make sense of other parts of the cultural 
system and natural world. They also legitimize and justify the system in the 
minds of its bearers. For this reason, meaningful components have variously been 
called “root paradigms” (Turner, 1977], “ultimate sacred postulates” (Rappa- 
port, 1979], “core principles” (Hallpike, 1986], and the like. Because it is 
critically important to a people’s understanding of the world and its place within 
it, they often have a special, even sacred, status. The notion of meaning is often 
linked to the idea of cultural holism. There is no logical reason for this limitation, 
and the idea may apply to cores or much smaller units. Subcultural units as small 
as the individual social scientific disciplines, street gangs, and clans often appear 
to have well-articulated systems of meaning. 

The special status of meaningful elements could provide coherence in sev- 
eral ways. First, the internal logic of a coherent block of culture may discriminate 
against intrusive elements. Diffused elements may be known to individuals, but 
the mismatch of meanings between whole cultures or subcultures entails that 
“foreign” values and ideas be misunderstood, disliked, and neglected. The 
mismatch may be between foreign elements but also between domains within a 
single culture (e.g., gender marked identities or even sets of subsistence skills]. 

Second, meaningful culture often involves markers of group identity that are 
especially salient to the definitions of ingroup and outgroup. Contexts where co- 
herent units of meaning-rich culture are available for acquisition from foreigners 
are likely to involve marked ritual observances or ceremonies that mobilize 
ethnocentric sentiments more thoroughly than mundane contacts like trade, in 
which symbolically less marked elements may diffuse readily. Ethnocentrism can 
provide an effective isolating barrier to diffusion of cultural elements in theory 
and apparently in practice (Boyd and Richerson, 1987] at the whole-culture level. 
Class, caste, gender, occupation, and even hobby groups are symbolically marked 
within some societies. Within bounded groups, however large they may be, in- 
termarriage, diffusion, and other mixing processes create cultural uniformity, but 
there are sharp differences among them. This is a form of indirect bias. 

Third, to the extent that what coheres in culture is a symbolic system of 
organizing meanings, rather than the meanings themselves, it is protected from 
ordinary adaptive evolutionary pressures. In language at least, the symbol system 
is so rich and flexible that quite novel new meanings can be coded with the 
existing system; only linguistically trivial changes in lexicons were needed to 
adapt modern languages to the industrial revolution. 
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Finally, elements may cohere because certain combinations are adaptive and 
favored by natural selection or derivative adaptive decision-making rules. 
Adaptive forces may simply discriminate so strongly against recombinants that 
coherence is maintained despite massive mixing, as seems to be the case in 
certain hybrid boundaries in the biological case (Barton and Hewitt, 1989], A 
related sort of selective “glue” could come from the multiplicity of evolution- 
arily stable strategies that seem to exist in social systems. Perhaps the stability of 
coherent features comes from the failure of new or foreign social practice to fit 
into actual arrangements, rather than from inconsistencies at the cognitive/ 
affective meaning level. The symbolic or ideological level may follow the social, 
rather than dictate it. 

Rushforth and Chisholm (1991) gave a possible example in their discussion 
of Athapaskan “structures of communicative social interaction.” According to 
these investigators, a core “framework of meaning and moral responsibility” has 
persisted among Bearlake Athapaskan of northern Canada with “extraordinarily 
little change” across many generations and hundreds of years (p. 64). Moreover, 
remarkably similar beliefs and values — urging industriousness, generosity, au- 
tonomy, and restraint — have been documented among more than 30 other 
Athapaskan-speaking peoples across three geographically discontinuous clusters 
in Canada and Alaska, the Pacific Northwest, and the American Southwest. 

A deeply rooted family of social norms such as these might directly underpin 
social institutions. The norms that underpin social interactions are good candi- 
dates to be maintained as a coherent block because they are part of local evo- 
lutionarily stable strategies. In game theory, at least, it is easy to imagine locally 
and evolutionarily stable strategies for complex social institutions that are im- 
possible to change at the margin by either diffusion or within-lineage change 
because small movements away from current practice are disadvantageous. 

Would the multiple evolutionarily stable strategies (ESS) explanation ac- 
count for the remarkable cultural persistence of Athapaskan norms? Focusing on 
the Bearlake version, Rushforth and Chisholm (1991) suggested that “the 
Bearlake interpretive scheme has persisted because of the historically stable 
composition of the [social interaction] strategies it informs” (p. 119). They 
argued that Bearlakers pursue goals in daily life that are defined and valued by 
their interpretive framework of beliefs and values. The interactions that follow 
generate regular rewards or “payoffs” that encourage individuals to convey 
certain intentions to others. But the actions that convey these intentions are 
precisely those defined by the framework. In short, the framework persists as “an 
unintended consequence of the strategic behavior of individuals operating in 
their own interests” (p. 121). 

Sometimes coherent traditions are “acquired” by imposition by an invading, 
dominant culture, or assimilation to an attractive one. Even in this case, little 
admixture from the competing coherent structure of the adopting culture need 
result from its transfer from one biological population to another, as in the im- 
position of a common Greco-Roman urban civilization on a host of “barbarian” 
peoples in ancient Europe and Western Asia. Note that individual people can 
move readily without disturbing the integrity of the coherent elements, as the 
assimilation of many immigrant people to at least aspects of Anglo-American 
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culture over the last two centuries testifies. Nevertheless, replication by transfer 
to a new biological population is arguably normally accompanied by much mixing 
of old and new, and the fission of one population into two daughters probably 
conserves coherence more effectively. Similarly, high rates of immigration need 
not necessarily result in high rates of erosion of coherence, but cultural diffusion 
does seem likely to be stimulated by immigration in typical cases. 


Evidence 

The Descent of Cultures as Wholes 

Commentators such as Marks and Staski (1988) sometimes imply that they 
defend this position. According to McNeill (1986), historians such as Toynbee 
imply a position as extreme as this end of our continuum, although without any 
specific defense. McNeill’s own magisterial Rise of the West was written to 
demonstrate how it was not possible to write a world history without ac- 
knowledging the exchange of ideas among major culture areas, much less within 
them. Holistic arguments, ultimately deriving from Wittgensteinian philosophy, 
once enjoyed great appeal in history and many branches of the social sciences, 
and echoes remain. For example, in linguistics, de Saussure (1959) is often cited 
as a proponent of extreme systemicity in language, and even today some linguists 
espouse this view (Wardhaugh, 1992). The limitations of such arguments have 
long been recognized by philosophers, and more recently by social scientists. 
There is such overwhelming evidence for substantial diffusion and rapid evolu- 
tion in many components of culture that it is unlikely that any tenable empirical 
defense of a completely holistic cultures-as-species position can be offered. 


The Descent of Core Traditions 

The hierarchical hypothesis of large-scale cultural coherence rooted in a core 
tradition is a point along the continuum that warrants closer examination. Like the 
previous hypothesis it assumes that culture is an ideational system (i.e., it con- 
sists of widely shared ideas, values, and beliefs that shape behavior in local 
human populations; the named cultures of anthropologists). In this model, 
cultures are viewed as hierarchically integrated systems, each with its own in- 
ternal gradient of coherence. At one extreme in the gradient are the “core” 
components of a culture — those ideational phenomena that constitute its basic 
conceptual and interpretive framework and influence many aspects of social life. 
At the other are peripheral elements that change rapidly or are widely shared 
by diffusion. On this hypothesis, the processes of coherence generate one main, 
central core unit. But this central unit does not equally organize all elements of 
culture. There may be many other smaller elements that are only lightly or not at 
all influenced by the core. 

Core versus Periphery. Regardless of whether the core gets its coherence from 
meaning, protection, diffusion, structured social interaction, or from all these 
sources, the key assertion of this model is that core components exhibit 
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a remarkable resilience in the course of cultural history. The core “sticks to- 
gether” as a cohesive bundle even through repeated episodes of culture birth, 
giving rise to a set of descendant branches that then share the same “tradition.” 
As Vansina (1990) argued based on his case study, such traditions are based upon 

the fundamental continuity of a concrete set of basic cognitive patterns 

and concepts [The] continuity concerns basic choices which, once 

made, are never again put into question These fundamental acqui- 

sitions then act as a touchstone for proposed innovations, whether from 
within or without. The tradition accepts, rejects, or molds borrowings 
to fit. It transforms even its dominant institutions while leaving its 
principles unquestioned, (p. 258) 

Despite these numerous sources of cohesion, the hierarchical hypothesis 
holds that many “peripheral” components exist that are only loosely tied to the 
core framework. These diffuse freely and readily, as in the well-studied case of 
technical innovations (Rogers, 1983). Peripheral components may include ide- 
ational elements that make sense on their own and can be socially transmitted 
without a lot of supplementary cultural information. Such components are as- 
sumed to play little or no organizational role within the broader ideational 
system, and they must be relatively easy to learn. Such components are expected 
to be highly “contagious,” rather like Dawkins’s (1993) viruses of the mind. 

New forms will be adopted quickly, simply, and smoothly, particularly if 
there is some perceived functional advantage and low cost. In this instance, 
change is quick and easy: different components come and go as independent 
interchangeable parts. They are likely to spread horizontally among cultures, 
regardless of whether those cultures are related historically by branching. For 
this reason, their phylogenies will have the vine-like appearance mentioned 
earlier. Kroeber (1948) gave a long list of well-known examples (e.g., days of the 
week, tobacco, printing, paper, gunpowder, etc.). Unlike the descent-of- wholes 
hypothesis, the hierarchical hypothesis recognizes that cores are not as com- 
pletely isolated as good biological species. Kroeber’s “tree of culture” implies 
that cultural descent is like a rain forest canopy tree — one whose crown is a 
tangle of branches (related by birth) and vines (related by diffusion). For some 
substantial period of time, one can easily distinguish what grows as branches 
from what grows as vines with more care, even in a thick, old tangle. Eventually, 
however, over the course of thousands of years, vines will proliferate and come 
to obscure the branches. At the same time, processes of coherence will integrate 
elements with separate histories. Old vines will coalesce to form a solid trunk — 
much like the strangler fig that starts out as a viny parasite of a tree, but gradually 
forms a solid trunk about its host, which then dies. 

The hierarchical model also acknowledges the rapidity of cultural evolution, 
compared with the biological case. The evidence of a history of common descent 
will gradually disappear in independent lineages. Barth (1987) gave a detailed 
account of the rapid evolution of the core tradition of the Mountain Ok of New 
Guinea due to a mutation-like process. The case is probably unusual because the 
core traditions are transmitted in rare secret rituals that create high “mutation” 
rates via forgetting. But even in the absence of diffusion, evidence of common 
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ancestry in sister cultures will degrade on the millennial timescale (compared 
with hundreds of millions of years, in the case of sister species of mammals}. We 
know from the massive convergence of agricultural technology and state-level 
social institutions in the pre-Columbian New and Old Worlds that cultural 
evolution can produce spectacular adaptive change on the timescale of a few 
thousand years. We can almost be certain that Old-New World similarities were 
independently derived convergences, but only because we have the evidence of 
hundreds of cultures on both branches to help distinguish the vines. Notoriously, 
careless historians who ignore the massively redundant evidence have no trouble 
“finding” false descent relationships between Old and New World cultures (e.g., 
Heyerdhal, 1950). 

The Practice of Constructing Core-Cultural Phylogenies. The hierarchical hypoth- 
esis is supported to the extent that it can be shown that a large complex of core 
traits has a common pattern of descent. The core traditions in question must be 
related through a sequence of population fissionings (allowing for the odd core 
transfer). The existence of only one deep element, such as language, cannot be 
used alone to infer the existence of a full core of shared traditions among cultures 
related by language only. Because language phylogenies can be traced to consid- 
erable depth using conservative aspects of vocabulary and phonology, language 
trees are the usual starting point for attempting to trace out the descent patterns 
of larger core units. Related traditions can then be used as a basis for reconstructing 
a fuller culture history, including the “proto-tradition” out of which they evolved 
(see Aberle, 1984, 1987). Sometimes genetic relatedness of the populations in- 
volved provides supplementary evidence, given that full core replication by pro- 
cesses other than fission of a parent culture is unusual. However, if diffusion and 
rapid evolution swamped all traces of relationship by birth, anthropology could 
not speak of branches, only vines, and hypothesis 3 would be supported. 

The work of Rushforth and Chisholm on Athapaskan similarities illustrates 
the method. Linguistic evidence indicates that Athapaskans are part of a second 
wave of Native Americans that arrived from Asia a few thousand years after the 
migration that contributed most known pre-Columbian populations. At contact, 
the Athapaskan language family was spoken by people in quite isolated clusters 
in Canada, California, and the Southwest (the Southwestern group includes the 
famous Apache and Navajo). According to their analysis, the evidence suggests 
that a core of meaning related to social behavior coheres with language and that 
all are “cognate,” (i.e., related historically by culture birth; Rushforth and 
Chisholm, 1991). 

First, the authors implied that the pertinent beliefs and values in Athapaskan 
populations are distinct from those of the surrounding populations belonging to 
other language groups (although it is also true that the differences are not thor- 
oughly documented in their presentation) . Second, similarity by diffusion can be 
ruled out because of the highly discontinuous geographical clustering of the 
carrier populations. Third, independent origins are highly improbable (Rushforth 
and Chisholm, 1991), even if each cluster of populations is taken as a whole. 

Rushforth and Chisholm (1991) concluded that the pertinent beliefs and 
values are all “genetically” related, having “originated in, and developed from, 
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a common, ancestral cultural tradition that existed among Proto-Athapaskan or, 
perhaps, even among [the ancestral] NaDene peoples” (p. 71). As they put it, 
"simplicity strongly argues” that "this cultural framework originated once, early 
in Proto-Athapaskan or NaDene history and has persisted (perhaps with some 
modifications) in different groups after migrations separated them from contact 
with each other” (p. 78). 

The work of Indo-Europeanists to reconstruct the descent of societies 
speaking this family of languages is the most ambitious attempt yet made to 
reconstruct a pattern of descent for a core. According to some Indo-Europeanists 
like George Dumezil and Marija Gimbutas, the Indo-Europeans are the bearers 
of a core tradition consisting of language elements, myths, and a distinctive 
tripartite pattern of social organization that had its origin in a particular culture 
of steppe horse nomads. Gimbutas’s reconstructed "Kurgans” lived about 6,500 
years ago between the Black and Caspian Seas. Her Kurgan proposal is widely 
respected but also widely criticized; a reconstruction of such breadth and depth 
tests the margins of the hierarchical hypothesis (Mallory, 1989). 

Shared core traditions have been proposed for people in a number of dif- 
ferent regions of the world, each with time horizons dating back at least a few 
thousand years. Recently reviewed in Durham (1992), these include the oft- 
cited case of cultural similarity among Polynesian islanders (see especially Kirch, 
1986; Kirch and Green, 1987; see critical review in Terrell, 1986), the Atha- 
paskan (Rushforth and Chisholm, 1991) and Indo-European traditions men- 
tioned earlier (e.g., Gamkrelidze and Ivanov, 1990; Hallpike, 1986; but see 
Mallory, 1989), Mayans (Vogt, 1964), Tibetans (Durham, 1991), and Tupi 
speakers among native South Americans (Durham and Nassif, 1991). Although 
one could always argue that the Polynesian case is exceptional because of the 
inherent isolation of its populations, plausible examples of enduring shared 
traditions among cultures related by birth have now been proposed for a diverse 
array of continental populations as well. 

Consider Vansina's (1990) recent comprehensive study of political tradition 
in equatorial Africa. Through a controlled comparison of some 200 distinct 
societies in the basin of the Zaire river and its tributaries, Vansina concluded that 
these "widely differing societies arose out of [a] single ancestral tradition” (p. 
191) by way of 3,000-4,000 years of historical transformations. As reconstructed 
by Vansina, the original ancestral tradition came into the region with the im- 
migration of western Bantu-speaking farmers. They brought with them a single 
distinct pattern of social organization based on fragile temporary alliances into 
House (capital H in original), village, and district, and a common ideology and 
world view to go with it (see Vansina, 1990). 

From this common baseline, Vansina (1990) argued, through successive 
splits, migrations, and expansions, “widely differing societies arose out of the 
single ancestral tradition by major transformations” (p. 191). The variation in- 
cluded, for example, two kinds of segmentary lineage societies, four kinds of 
associations, and five kinds of chiefdoms or kingdoms. All the while, "the 
principles and fundamental options inherited [at birth] from the ancestral tra- 
dition remained a gyroscope in the voyage through time: they determined what 
was perceivable and imaginable as change” (p. 195). 
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Vansina made it clear that outside influences — “the new habitants, the 
autochthons [indigenous hunter-gatherers in the region], the non-Bantu, the 
eastern Bantu farmers with their different legacies — each influenced the devel- 
opment of this ancestral tradition differently from place to place” (p. 69]. Yet as 
he repeatedly showed, change “was not mainly induced by outside influences. In 
all these cases [for exa mple, in the inner Zaire basin] a chain of reactions fed 
continuous internal innovations. Outside innovations were accepted only insofar 
as they made sense in terms of existing structures” (p. 126], Even in regions 
where external influences played a relatively heavy role, the internal sovereignty 
of distinct polities meant that “internal dynamics always remain determining” 
(p. 192). Even with the establishment of Atlantic trade after 1480 and the 
attendant challenges of slave raiding and more, “the tradition was not defeated. 
It adapted. It invented new structures. [N]o foreign ideals or basic concepts were 
accepted and not even much of a dent was made in the aspirations of in- 
dividuals” (p. 236f). Inherited at birth in each equatorial society, the tradition 
lived on for hundreds of years more, only to be destroyed by European conquest 
between 1880 and 1920. 

Why Core Homology Matters. Vansina’s [1990] study illustrates a key proposi- 
tion of the hierarchical model. Even in continental areas with high contact be- 
tween peoples, one can still trace “the historical course of a single tradition” 
[p. 261). But there is a second important implication as well: reconstructing the 
histories of peoples without written records requires that one distinguish be- 
tween homologies [similarities produced by culture birth), analogies [similarities 
produced by convergence or parallel change), and synologies [similarities pro- 
duced by diffusion or borrowing). The reason, as Vansina noted, is that the 
reconstruction of past cultures requires that one “seeks out homologies first” [p. 
261). Only by identifying genuine cultural homologies can one establish the 
nature of the initial ideational system that was later transformed by historical 
processes. To the extent that hypothesis 2 of the four proves valid, it offers a 
useful tool that societies with no written records can use to gain access to their 
own histories. 


The Descent of Small Cultural Components 

On this hypothesis, there is no central core culture that deserves special atten- 
tion in phylogenetic analysis. Rather, there are multiple “cores” and sometimes 
quite small units whose descent can be usefully traced. To characterize a narrow 
region on the continuum of possible hypotheses, we suppose that even the 
biggest deeply coherent blocks of culture are fairly small. 

Definition. The components are collections of memes that are transmitted as units 
with little recombination and slow change, and therefore their phylogenies can 
be reliably reconstructed to some depth. [As for the hierarchical hypothesis, how 
much recombination and change are tolerable depends on the timescale — deeper 
phylogenies require more coherent units and slower rates of evolution.) On 
this hypothesis, different components diffuse and recombine at a rapid rate, 
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compared with the rates of elements within components, so that core-like 
complexes of components will have shallower phylogenies than their smaller con- 
stituent components. 

The processes that provide “glue” for the hierarchical core hypothesis also 
explain the coherence within these smaller units. The amendments needed are 
only quantitative. If the scope of integration provided by internal processes is 
limited, and if ethnocentric barriers to diffusion are weak or shifting in kinds of 
components is protected, recombination between large blocks of memes will be 
high, although the same processes may protect many small sets of coherent memes . 
In practice, the units have to be large enough to have significant internal com- 
plexity, or their actual documented history has to be good. Otherwise, the amount 
of information available for descent reconstruction is limited. Thus, before the 
advent of modern molecular techniques, the functionally similar genes in various 
bacteria had a pattern of descent, but the traces of history needed to reconstruct 
the pattern were absent. When genes can be sequenced, a vastly greater array of 
data is available by reading the DNA strands directly. Strings of functionally 
irrelevant, highly improbable similarities and differences in the strands can now 
be used to construct phylogenies where classical biologists despaired. 

Is there any theoretical reason to expect smaller, rather than larger, coherent 
units in the cultural case? The fact that different cultural variants can be acquired 
from different people during different parts of the life cycle makes genealogical 
processes less effective at maintaining coherence than the analogous processes 
in the case of genetic evolution. We all have many cultural parents, with the 
attendant potential for independent samples of culture from many sources. At the 
same time, mixing could be less effective within small units because one can 
learn some things from one person or a small group of closely related mentors 
and other things from a quite different set of mentors. This may lead to small, 
but coherent, subcultures within a larger culture complex. For example, the cul- 
ture of science is fairly coherent and coexists within the same society as the 
culture of rock climbers, but people from each of these partial cultures may share 
the partial culture of the English language. (Of course, to some extent, science, 
rock climbing, and English are international institutions and provide avenues of 
communication among the cultures that play host to them.) On this argument, 
maintaining cultural coherence over large units faces a considerable mechanical 
obstacle due to the hyperrecombinatorial nature of the cultural transmission 
system. 

If one focuses on one special unit, such as those few features of language that 
cohere over long timescales, one may indeed find a few correlated units of other 
types that persist in having a pattern of descent in common with the language 
features, merely as a matter of chance. From one attempt at deep reconstruction 
to another, different pseudocore elements will be discovered. 

The linguistic characters used by historical linguists (basic lexicon, phono- 
logical rules) provide good examples of what is meant by a cultural component. 
Linguists can reconstruct a phylogeny for a basic lexicon and phonological rules 
that tells us the pattern of relationships among variants of this character. For 
example, we know that the basic lexicon and phonological rules that characterize 
English and German share a more recent ancestor than either does with French. In 
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other words, we believe that we can trace back the sizable complex of memes that 
underlie the English basic lexicon and phonology through a series of ancestor- 
descendant pairs to a point where the same people speak a language that has 
phonological rules and a basic lexicon that also forms the ancestor of German. 

Examples of Coherence of Small Units and Recombination among Them. A clear 
example of how sets of memes exhibit considerable coherence when borrowed 
between groups can be seen in the adoption of the “age organization” principle 
by Bantu peoples in Central and Eastern Africa (LeVine and Sangree, 1962). Age 
sets are an institution in which children born within a few years of one another 
are simultaneously initiated into a group of adolescents of nearly the same age 
(boys and girls into different sets). After initiation, a given age set is a corporate 
organization that is formally charged with a series of roles in succession (war- 
rior, married man, elder, etc.), with formal graduation from role to role of the 
whole set. 

The Tiriki (an offshoot of the Abaluhyia Bantu), for example, currently have 
an age organization almost identical to that of their Nilotic neighbors, the Terik, 
while remaining distinctively Abaluhyia in language and culture. This situation 
arose as a result of intense political turmoil in the mid-eighteenth century, when the 
Terik offered asylum to refugee segments of Abaluhyia lineages on condition 
that their men would become incorporated into the Terik warrior groups. At this 
time, the Tiriki warriors accepted the full set of initiation rituals for their sons 
(circumcision and seclusion) and adopted the seven named age-set system. In 
addition, the grades of warrior, retired warrior, judicial elder, and ritual elder 
emerged as the principal corporate units of political significance at the local level, 
and the Nilotic ideology of bravery and prowess in battle became predominant. 
Indeed, there is some evidence that the Tiriki became a distinct group within the 
Abaluhyia as a result of their adoption of Terik customs, as is indeed suggested by 
their name. Interestingly, the practice of female circumcision was viewed with 
disfavor by the Tiriki, such that they never adopted this trait. In short, this 
example shows how a number of cultural elements can be borrowed as a package, 
although not indiscriminately so, and the packages are often smallish. 

Linguistics also provides many good examples. Important components of the 
language spoken by a group of people often have a different evolutionary history 
from the basic lexicon and phonology of the same language. A substantial fraction 
of the words in the English lexicon (but not in the basic lexicon) share more 
recent common ancestors with words in French than with German. This is also 
true of English syntax, subject- verb-object like French, not subject-object- verb 
like most Germanic languages. It is even true of aspects of English phonology. For 
example, English speakers distinguish veal and feel, apparently as a result of the 
influence of Norman loan words. Thus, we can identify coherent cultural entities, 
words, and syntactical and phonological rules that are longer lived than the larger 
complex called the English language, and whose ancestry can be traced back 
through independent series of ancestor-descendant relationships. Thomason and 
Kaufman (1991) provided numerous other examples, including the Ma’a lan- 
guage spoken in northern Tanzania, which, despite classification as a Nilotic 
language, has a basic lexicon related to Cushitic languages and a grammar related 
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to Bantu languages. [We return to the problems that this example raises for the 
practice of linguistic classification later.) 

Less formal data suggest that important social organizational rules and val- 
ues are often decoupled rather rapidly from descent, as can be reckoned by the 
user of a basic lexicon and phonology. In Central and East Africa, for example, 
cyclical and linear age sets, alternating generation classes, genital mutilation of 
males and females, warrior organizations, and many other associated practices 
are common among people whose basic lexicons are categorized as Nilotic, 
Cushitic, and Bantu. Although it was once thought that these customs were 
essentially of Cushitic origin, it is now clear from Ehret’s [1971) linguistic 
analyses and voluminous ethnographic sources that different customs associated 
with the recruitment, function, and ritual validity of age organizations have been 
repeatedly borrowed between protolinguistic units over the last 5,000 years, 
reflecting periods of proximity, expansion, and dependence. The resulting situ- 
ation is one of a thorough intertwining of social organization and language. 

In some cases, the distribution of cultural traits appears to represent func- 
tional convergences, as in the case of the Tiriki, who adopted age sets and male 
circumcision in response to the turbulent militaristic conditions of the times. In 
other cases, there is evidence of a decoupling of apparently nonfunctional details. 
Thus, the Bantu Gusii conduct male and female genital mutilation but appar- 
ently have never organized their men into age sets [LeVine and Sangree, 1962); 
the Datoga dropped the 5-8 cycling age-set system of their protosouthern Ni- 
lotic ancestors for noncycling generation classes [Ehret, 1971). The Bantu Kuria 
provide a particularly revealing example of this complexity [Tobisson, 1986). 
Men belong to age sets almost indistinguishable in name from those of the 
southern Nilotes but are recruited on entirely different principles [father’s set 
membership, rather than circumcision cohort). However, the Kuria have im- 
portant military units; these are based on circumcision but are organized quite 
differently from those of the Nilotes and are quite unrelated to the age-set 
system that among the Kuria bears Nilotic names. The inescapable conclusion to 
be drawn from these complex observations is that the phylogeny of language and 
other cultural characters are often distinct. 

Religious practices provide many further examples: the spread of the Sun 
Dance on the Great Plains, the spread of Islam from Western to Central and 
Eastern Asia and Northern Africa, millenarian movements in Melanesia, and so 
on. Ethnographic details are sometimes available for such borrowings, and the 
motives involved do not seem to be such as to enforce much coherence. For 
example, Sierra Leonean Creoles first adopted freemasonry in the late 1940s. 
The reason seems to have been that exclusive occupation of elite political roles 
had long served Creoles with an integrative community symbolic system. When 
Creoles lost power to the large majority of tribal peoples without a slave back- 
ground, this symbol system was lost. Freemasonry happened to be an available 
substitute and quickly became very important [Cohen, 1974). Of course, na- 
tional and imperial powers sometimes maintain symbolic units over wide areas 
for impressive periods of time. The Habsburgs’ success in defending Catholi- 
cism and expelling Protestantism and Islam from their dominions during the life 
of the Austro-Hungarian Empire is a famous example. However, the need to 
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exercise a large measure of brute force to succeed in such an enterprise is per- 
haps testimony to the long-run weakness of large-scale coherence. 

There also may be rather well-bounded subcultures within a language group 
(as defined by a basic lexicon), as in the Indian caste system or the class, occu- 
pational, and religious subunits of many other state-level agricultural societies. 
Here, some memes are confined to some subset of the group — the castes, the 
guild, and so on. These subgroups may be marked by boundaries that are rather 
impervious to the flow of at least some kinds of memes. This phenomenon 
reaches its extreme in contemporary societies like the United States, where a 
diverse array of specialized subcultures of many types exists. 

These subgroups may be far more enduring than the "cultures” to which 
they bear a somewhat temporary allegiance. For example, East Africanists often 
question the attribution of any time depth to the ethnic units currently residing 
in the area. This is not simply a consequence of European colonialist policy. 
Thus, Waller (1986) painted a picture of the nineteenth-century and earlier 
ephemeral political associations of clans with different linguistic and cultural 
backgrounds, linked through diverse patterns of intermarriage, trade, expansion, 
and dependency. These flexible and highly inclusive concepts of group identity 
are seen as an adaptation to heterogeneous and somewhat unpredictable envi- 
ronmental conditions (i.e., circumstances by no means unique to East Africa). 
Knauft (1985) told a similar story about the Gebusi and their neighbors, the 
Bedamini, in the Fly River area of Papua New Guinea. According to this picture, 
there would be frequent recombination of memes due to temporary association 
of peoples who exchange memes while in contact. 

Comparison of Core and Small Units Hypotheses. Whether such examples are more 
representative than those given by supporters of the core hypothesis is an im- 
portant, but unanswered, question. The little anthropological work done is not 
capable of answering this question. There are a few studies, but they are inde- 
cisive. Jorgensen's (1967, 1980) studies of the Salish and larger-scale analysis of 
the Indians of western North America are examples of the kind of comprehensive 
cultural analysis that might deliver. However, his methods are based on measures 
of overall similarity and difference and do not constitute proper analyses of de- 
scent. Biological systematists argue that the only evidence for membership in a 
given branch of a descent tree is given by characters that are shared by that branch 
alone but not more ancient or more recent similarities, much less similarities 
acquired by convergence. 

Even in the case of language, "wave” models of linguistic evolution 
have long contended with "genetic” analyses based on strict criteria of descent 
(Jorgensen, 1980; Mallory, 1989; Renfrew, 1987). Many features of Indo- 
European languages seem easier to account for if we assume that the whole 
family was in contact throughout most of its history and that innovative features 
tended to diffuse from multiple centers to neighboring languages. Treelike 
models of relationship can certainly be constructed for data that are substantially 
influenced by wavelike processes (e.g., with clustering algorithms). Just because 
a tree diagram explains much of the variation in a set of data, it does not guar- 
antee that the descent hypothesis is correct. It would be quite interesting to see 
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the modern “cladistic' ' methods of biological systematists formally applied to such 
cultural descent problems. At least part of the solution to the debate between 
proponents of hierarchical core and small units hypotheses will rest on the appli- 
cation of sharper methodological tools, and biologists have something to offer. 


The Descent of Memes 

The boundary of the small units hypothesis toward the small end of the con- 
tinuum is not well defined. It is also possible that, aside from core vocabulary and 
phonology, there are few multimeme cultural units that are well protected from 
diffusion. It could be that each of the cultural things we observe is affected by 
many memes, that these memes readily diffuse from one socially or linguistically 
defined group to another, and that memes that affect different cultural com- 
ponents readily recombine. For example, a religious system might be affected by 
many different memes: beliefs about causation, beliefs about the role of men and 
women, beliefs about disease, and so on. This system could diffuse from one 
group to another, and then some of the memes could recombine with other 
aspects of the culture. Beliefs about the roles of men and women that came with 
the new religious system might then recombine with preexisting beliefs about 
subsistence practices, generating new, observable subsistence variants. If we 
could actually measure the memes that characterize different human groups, this 
case would be much like the previous one, except we would reconstruct the 
phylogenies of memes largely instead of whole cultural components. 


Descent Analysis: Impossible or Uninteresting? 

There are several situations in which descent analysis regarding culture is im- 
possible. If we observe phenotype, and not the mental representations that are 
stored and transmitted, we cannot directly measure memes. The fact that many 
memes affect any given observable cultural attribute makes it difficult to trace 
the path of recombining memes, and reconstructing phylogenies is likely to be 
impossible. If the actual units to which descent might apply are as small as or 
smaller than our practically observable units, descent is impossible to trace 
simply because there is not enough information available to separate common 
descent from other hypotheses, such as independent origins. A quantitative 
character subject to blending inheritance is an extreme example. 

In some cases, methodological improvements may increase resolution. 
Comparative ethnographic data with age sets scored as present/absent, or as a 
quantitative variable on political importance, would not contain enough detail to 
reconstruct much history in East Africa. A richer data set offers more possibil- 
ities, as we have seen. 

The existence of coherent cultures will depend on the rate of diffusion and 
independent evolution. If the rate of diffusion among cultures for most char- 
acters is high, then there will be no cultural unit larger than some small atomistic 
unit of which to track the descent. Between the time that a newly formed group 
buds off its parent, and the time it creates buds itself, many new traits will have 
entered the group from outside. If the rate of evolution is high, the trace of 
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history also vanishes. High rates of random evolution, especially simple char- 
acters with few observable states, will eventually result in so many random 
"hits” that descendant characters will have occupied all states fairly recently. 
Similar simple artistic motifs are found in many cultures, perhaps because artists 
frequently rediscover and abandon them. Functional convergence presents sim- 
ilar problems. Around the world, tropical horticulturalists often live in small- 
scale societies that are murderously hostile to their neighbors. This commonality 
is presumably a by-product of the population densities and level of political 
organization supportable in wet tropical climates, not due to common ancestry. 

Even when descent analysis is possible, it may be uninteresting. The few 
components that resist diffusion — basic lexicon and so on — will be descended 
from the grandparental group (defined in terms of basic lexicon], but most 
components will not be descendants of components in that same grandparental 
group. Put another way, a culture is nothing more than its most elementary 
components. Each component may well be traceable back to a grandparental 
society. But a neighboring society may share particular grandparents for particular 
traits at random. Phylogenetic analysis could still be conducted for an element-by- 
element case, and this might be of interest or utility for some special cases. 
However, one important use of phylogeny is to make manageable the over- 
whelming complexity of populations and cultures. With no coherence, the 
analysis of descent could promise nothing in this regard. 


Partial Phylogenies and the Study of Adaptation 

Good phylogenies are crucial for the proper study of adaptation using the 
comparative method. Comparative studies attempt to determine the function 
of various attributes by looking for predicted correlations among societies. For 
example, Thornhill (1991] hypothesized that inbreeding avoidance rules func- 
tion to preserve capital in powerful families. To test this hypothesis, she col- 
lected data on inbreeding rules and social stratification, predicting (accurately] 
that the degree of elaboration of rules would positively correlate with the degree 
of social stratification. 

Similar studies utilizing correlations among species are widely used in 
comparative biology. A key problem in such comparative studies is determining 
the extent to which different societies (or species] are independent data points. 
In comparative biology, only independently derived associations are counted as 
separate data points. Thus, if an innovation arises and then the lineage speciates, 
preserving the innovation in both daughter species, the daughter species should 
be counted as a single data point. The first step in the proper exercise of the 
comparative method is phylogenetic reconstruction (Harvey and Pagel, 1991]. In 
cross-cultural anthropology, this problem is referred to as "Gabon’s problem.” 
Scholars working in this discipline attempt to select their samples so as to in- 
clude only unrelated cultures or correct for diffusion by using statistical methods 
(Burton and White, 1987], 

Adaptations acquired by diffusion from other groups are related by descent 
to the adaptations in those groups. If one analogizes with the practice in biology, 



RE CU LTU R, 


OGENIES POSSIBLE? 333 


such adaptations would not be counted as independent cases because the ad- 
aptation in the borrowing group is not an innovation. However, to the extent 
that diffusion represents the goal-driven choices of individuals in the borrowing 
group (or some other potentially adaptation-producing process), the borrowed 
trait is independent. If it had not been an adaptation, it would not have been 
adopted. This problem is particularly acute given that the rate of diffusion of 
new cultural adaptations through biased transmission is likely to be much higher 
than the rate of innovation. If this is so, most groups will adapt by borrowing, 
and it is unreasonably conservative to disregard these cases. 

The relationship between the Sun Dance and the buffalo-hunting ecology 
of the Great Plains people illustrates this difficulty. A summer ceremonial called 
the Sun Dance characterized all the Great Plains buffalo-hunting people. One 
might hypothesize that such a ceremony is related to the fission-fusion social 
organization that characterized the buffalo-hunting ecology of those people. But 
does one count this as one case, or several? It is likely that this ceremony orig- 
inated with the Crow and diffused to other tribes, so the various versions of the 
ceremony are not independent inventions. However, each group did adopt the 
ceremony, perhaps because it served the hypothesized need. Moreover, it could 
be that, in the absence of diffusion, each group would have independently de- 
veloped a summer ceremonial but did not because the rate of adaptation by 
diffusion is faster than independent invention (Oliver, 1962). 

On a longer temporal and spatial scale, the problem is also well illustrated 
by basic technical innovations like agriculture or iron working. The number of 
independent inventions of these techniques were few indeed — fewer even than 
the number of language-based descent groups that have subsequently adopted 
them. It seems absurd to say that we cannot really decide whether iron working 
is adaptive because all examples of iron-working technology are derived from a 
single common ancestor in Asia Minor about 3,400 years ago. Regardless of our 
answer of how many cases of iron working to count for purposes of estimating its 
adaptive value, it seems clear that language-based descent groups are largely 
irrelevant to solving this problem. We say “largely irrelevant” because it does 
seem that an association of an important adaptive innovation with a linguistic 
unit sometimes lasts long enough to carry the language area great distances, as 
with iron working and the Bantu expansion in Africa in the last millennium B.C. 
and the first millennium A.D. (Ehret, 1982); the use of abundant, but low- 
quality, plant resources and the spread of Numic languages in the American 
Great Basin (Bettinger and Baumhoff, 1982); and the domestication of the horse, 
invention of wheeled transport, and spread of Indo-European (Mallory, 1989). 
Note that such associations tend to persist only for a millennium or so, although 
the expansion of the innovating group tends to preserve the association. 


Conclusion 

It seems that, as regards most meme complexes, specific cultures are more like 
local populations within a species than like species. The whole human species is 
united by complex flows of ideas from one culture to another. This has always 
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been so, although the geographical isolation of the New World, Australia, and a 
few other areas from each other and Eurasia may have substantially isolated large 
blocks of cultures on multimillennial timescales. On smaller time and space 
scales, other mechanisms of isolation and coherence do generate some patterns 
of descent that are traceable for a few millennia. 

The use of descent analysis for cultural units has a long, but controversial, 
history. Many authors claim a degree of success in reconstructing the history of 
descent of fairly large cultural units fairly far into the past. The most interesting 
outstanding question is the size and timescale of coherent units of culture. Do 
single cores in an interrelated complex have real histories that reach back five 
millennia or more? There seems to be no doubt that many small units have 
descent relationships that can be reliably inferred for this depth, but the upper 
size/time limit is not well defined by current methods. There is an ill-explored 
neutral analogy worth further work here. The cladistic revolution in systematic 
biology has sharpened concepts and built new tools for phylogenetic analysis. 
Might they be used, despite the problem of high diffusion rates among cultures 
compared with species, to help advance the resolution of genetic versus wave 
explanation of culture history? 
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1 7 Was Agriculture Impossible 
during the Pleistocene but 
Mandatory during the 
Holocene? 

A Climate Change Hypothesis 

With Robert L. Bettinger 

Evolutionary thinkers have long been fascinated by the origin of 
agriculture. Darwin (1874) declined to speculate on agricultural origins, but 
twentieth-century scholars were bolder. The Soviet agronomist Nikolai Vavilov, 
the American geographer Carl O. Sauer, and the British archaeologist V. Gordon 
Childe wrote influential books and articles on the origin of agriculture in the 
1920s and 1930s (see Flannery, 1973, and MacNeish, 1991:4-19, for the intel- 
lectual history of the origin of agriculture question). These explorations were 
necessarily speculative and vague but stimulated interest in the question. 

Immediately after World War II, the American archaeologist Robert Braid- 
wood (Braidwood et ah, 1983) pioneered the systematic study of agricultural 
origins. From the known antiquity of village sites in the Near East and from the 
presence of wild ancestor species of many crops and animal domesticates in the 
same region, Braidwood inferred that this area was likely a locus of early do- 
mestication. He then embarked on an ambitious program of excavation in the 
foothills of the southern Zagros Mountains using a multidisciplinary team of 
archaeologists, botanists, zoologists, and earth scientists to extract the maximum 
useful information from the excavations. The availability of 14 C dating gave his 
team a powerful tool for determining the ages of the sites. Near Eastern sites older 
than about 15,000 B.P. excavated by Braidwood (Braidwood and Howe, 1960) 
and others were occupied by hunter-gatherers who put much more emphasis on 
hunting and unspecialized gathering than on collecting and processing the seeds 
of especially productive plant resources (Goring-Morris and Belfer-Cohen, 1998; 
Henry, 1989). Ages are given here as calendar dates before present (b.p ), where 
present is taken to be 1950, estimated from 14 C dates according to Stuiver et al.’s 
(1998) calibration curves. The Braidwood team showed that about 1 1,000 years 
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ago, hunter-gatherers were collecting wild seeds, probably the ancestors of wheat 
and barley, and were hunting the wild ancestors of domestic goats and sheep. At 
the 9,000 b.p. site of Jarmo, the team excavated an early farming village. Using 
much the same seed-processing technology as their hunter-gatherer ancestors 
2,000 years before, the Jarmo people were settled in permanent villages culti- 
vating early-domesticated varieties of wheat and barley. 

Numerous subsequent investigations now provide a reasonably detailed 
picture of the origins of agriculture in several independent centers and its sub- 
sequent diffusion to almost all of the earth suitable for cultivation. These in- 
vestigations have discovered no region in which agriculture developed earlier or 
faster than in the Near East, though a North Chinese center of domestication 
of millet may prove almost as early. Other centers seem to have developed later, 
or more slowly, or with a different sequence of stages, or all three. The spread of 
agriculture from centers of origin to more remote areas is well documented for 
Europe and North America. Ethnography also gives us cases where hunters and 
gatherers persisted to recent times in areas seemingly highly suitable for agri- 
culture, most notably much of western North America and Australia. Attempts 
to account for this rather complex pattern are a major focus of archaeology. 


Origin of Agriculture as a Natural Experiment 
in Cultural Evolution 

The processes involved in such a complex phenomenon as the origin of agricul- 
ture are many and densely entangled. Many authors have given climate change a 
key explanatory role (e.g., Reed, 1977:882-883). The coevolution of human 
subsistence strategies and plant and animal domesticates must also play an im- 
portant role (e.g., Blunder and Byrne, 1991; Rindos, 1984). Hunting-and-gath- 
ering subsistence may normally be a superior strategy to incipient agriculture 
(Cohen and Armelagos, 1984; Harris, 1977), and, if so, some local factor may be 
necessary to provide the initial impetus to heavier use of relatively low-quality, 
high-processing-effort plant resources that eventually result in plant domesti- 
cation. Population pressure is perhaps the most popular candidate (Cohen, 
1977). Quite plausibly, the complex details of local history entirely determine 
the evolutionary sequence leading to the origin and spread of agriculture in every 
region. Indeed, important advances in our understanding of the origins of agri- 
culture have resulted from pursuit of the historical details of particular cases 
(Bar- Yosef, 1998; Flannery, 1986). 

Nonetheless, we propose that much about the origin of agriculture can be 
understood in terms of two propositions: 

Agriculture was impossible during the last glacial age. During the last glacial age, 
climates were variable and very dry over large areas. Atmospheric levels of C0 2 
were low. Probably most important, last-glacial climates were characterized by 
high-amplitude fluctuations on timescales of a decade or less to a millennium. 
Because agricultural subsistence systems are vulnerable to weather extremes, and 
because the cultural evolution of subsistence systems making heavy, specialized use 
of plant resources occurs relatively slowly, agriculture could not evolve. 
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In the long run, agriculture is compulsory in the hobcene epoch. In contrast 
to the Pleistocene climates, stable Holocene climates allowed the evolution of 
agriculture in vast areas with relatively warm, wet climates, or access to irriga- 
tion. Prehistoric populations tended to grow rapidly to the carrying capacity set 
by the environment and the efficiency of the prevailing subsistence system. Local 
communities that discover or acquire more intensive subsistence strategies will 
increase in number and exert competitive pressure on smaller populations with 
less intensive strategies. Thus, in the Holocene epoch, such intergroup compe- 
tition generated a competitive ratchet favoring the origin and diffusion of agri- 
culture. 1 

The great variation among local historical sequences in the adoption and 
diffusion of agriculture in the Holocene provides data to test our hypothesis. In 
the Near East, agriculture evolved rapidly in the early Holocene and became a 
center for its diffusion to the rest of western Eurasia. At the opposite extreme, 
hunting-and-gathering subsistence systems persisted in most of western North 
America until European settlement, despite many ecological similarities to the 
Near East. Thus, each heal historical sequence is a natural experiment in the factors 
that limit the rate of cultural evolution of more intensive subsistence strategies. For 
our hypothesis to be correct, the evolution of subsistence systems must be rapid 
compared to the time cognitively modern humans lived under glacial conditions 
without developing agriculture, but slow relative to the climate variation that 
we propose was the main impediment to subsistence intensification in the late 
Pleistocene epoch. By cultural evolution, we simply mean the change over time 
in the attitudes, skills, habits, beliefs, and emotions that humans acquire by 
teaching or imitation. In our view (Bettinger, 1991; Boyd and Richerson, 1985], 
culture is best studied using Darwinian methods. We classify the causes of cul- 
tural change into several “forces.” In a very broad sense, we recognize three 
classes of forces: those due to random effects (the analogs of mutation and drift], 
natural selection, and decision making (invention, individual learning, biased 
imitation, and the like]. The decision-making forces will tend to accelerate 
cultural evolution relative to organic evolution, but by how much is a major 
issue in the explanation of agricultural origins. 


Was Agriculture Impossible in the Pleistocene? 

The Pleistocene geological epoch was characterized by dramatic glacial advances 
and retreats. Using a variety of proxy measures of past temperature, rainfall, ice 
volume, and the like, mostly from cores of ocean sediments, lake sediments, and 
ice caps, paleoclimatologists have constructed a stunning picture of climate 
deterioration over the last 14 million years (Bradley, 1999; Cronin, 1999; Lamb, 
1977; Partridge, et ah, 1995]. The Earth's mean temperature dropped several 
degrees and the amplitude of fluctuations in rainfall and temperature increased. 
For reasons that are as yet ill understood, glaciers wax and wane in concert with 
changes in ocean circulation, carbon dioxide, methane and dust content of the 
atmosphere, and changes in average precipitation and the distribution of pre- 
cipitation (Broecker, 1995]. The resulting pattern of fluctuation in climate is 
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very complex. As the deterioration proceeded, different cyclical patterns of 
glacial advance and retreat involving all these variables have dominated the 
pattern. A 21,700-year cycle dominated the early part of the period, a 41,000- 
year cycle between about 3 and 1 million years ago, and a 95,800-year cycle 
during the last million years (deMenocal and Bloemendal, 1995). Milankovich’s 
hypothesis that these variations are driven by changes in the earth’s orbit, and 
hence the solar radiation income in the different seasons and latitudes, fits the 
estimated temperature variation well, although doubts remain (Cronin, 1999: 
185-189). 

Rapid Climate Variation in the Late Pleistocene 

The long timescale climate change associated with the major glacial advances and 
retreats is not directly relevant to the origins of agriculture because it occurs so 
slowly compared to the rate at which human populations adapt by cultural 
evolution. However, the ice ages also have great variance in climate at much 
shorter timescales. For the last 400,000 years, very high-resolution climate proxy 
data are available from ice cores taken from the deep ice sheets of Greenland and 
Antarctica. Resolution of events lasting little more than a decade is possible in 
Greenland ice 80,000 years old, improving to monthly resolution 3,000 years 
ago. During the last glacial, the ice core data show that the climate was highly 
variable on time scales of centuries to millennia (Clark, Alley, and Pollard, 1999; 
Dansgaard et al., 1993; Ditlevsen, Svensmark, and Johnsen, 1996; GRIP 1993). 
Figure 17.1 shows data from the GRIP Greenland core. The <5 I8 C> curve is a 
proxy for temperature; less negative values are warmer. Ca 2+ is a measure of 
the amount of dust in the core, which in turn reflects the prevalence of dust- 
producing arid climates. The last glacial period was arid and extremely variable 
compared to the Holocene. Sharp millennial-scale excursions occur in estimated 
temperatures, atmospheric dust, and greenhouse gases. The intense variability of 
the last glacial carries right down to the limits of the nearly 10-year resolution 
of the ice core data. The highest resolution records in Greenland ice (and lower 
latitude records) show that millennial-scale warmings and coolings often began 
and ended very abruptly and were often punctuated by quite large spikes of 
relative warmth and cold with durations of a decade or two (e.g., Grafenstein 
et al., 1999). Figure 17.2 shows Ditlevsen et al.’s (1996) analysis of a Greenland 
ice core. Not only was the last glacial age much more variable on timescales of a 
century and a half or more (150-year low-pass filter) but also on much shorter 
timescales (150-year high-pass filter). Even though diffusion and thinning within 
the ice core progressively erases high-frequency variation in the core (visible 
as the narrowing with increasing age of the 1 50-year high-pass data in figure 17.2), 
the shift from full glacial conditions about 18,000 years ago to the Holocene 
interglacial is accompanied by a dramatic reduction in variation on timescales 
shorter than 150 years. The Holocene (the last relatively warm, ice-free 11,600 
years) has been a period of very stable climate, at least by the standards of the 
last glacial age. 2 

The climate fluctuations recorded in high-latitude ice cores are also recorded 
at latitudes where agriculture occurs today. Sediments overlain by anoxic water 
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Figure 17 . 1 . Profiles of a temperature index, <5 ls O, and an index of dust content, Ca 2+ , 
from the GRIP Greenland ice core. 200-year means are plotted. The parts of the GRIP 
profile representing the last interglacial may have been affected by ice flow so their 
interpretation is uncertain (Johnsen et al., 1997). Note the high-amplitude, high- 
frequency variation in both the temperature and dust records during the last glacial 
age. The Holocene epoch is comparatively much less variable. Plotted from original 
data obtainable at: ftp://ftp.ngdc.noaa.gov/paleo/icecore/greenland/summit/grip/iso- 
topes/gripdl 80 .txt and ftp://ftp.ngdc.noaa.gov/paleo/icecore/greenland/summit/grip/ 
chem/ca.txt. 


that inhibits sediment mixing by burrowing organisms are a source of low- and 
mid-latitude data with a resolution rivaling ice cores. Events recorded in North 
Atlantic sediment cores are closely coupled to those recorded in Greenland ice 
(Bond et al., 1993), but so are records distant from Greenland. Hendy and 
Kennett (2000) report on water temperature proxies from sediment cores from 
the often-anoxic Santa Barbara Basin just offshore of central California. This data 
shows millennial- and submillennial-scale temperature fluctuations from 60-18 
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Figure 17.2. High-resolution analysis of the GRIP ice core <5 ls O data by Ditlevsen 
et al. (1996). The low-pass filtered data show that the Holocene epoch is much 
less variable than the Pleistocene on timescales of 150 years and longer. The high-pass 
filtered data shows that the Pleistocene was also much more variable on timescales less than 
150 years. The high- and low-pass filtering used spectral analytic techniques. These are 
roughly equivalent to taking a 150-year moving average of the data to construct the 
low-pass filtered series and subtracting the low-pass filtered series from the original data 
to obtain the high-pass filtered record. Since layer thinning increasingly affects deeper 
parts of the core by averaging variation on the smallest scales, the high-pass variance is 
reduced in the older parts of the core. In spite of this effect, the Pleistocene/Holocene 
transition is very strongly marked. 


thousand years ago with an amplitude of about 8°C, compared to fluctuations of 
about 2°C in the Holocene epoch. As in the Greenland cores, the millennial-scale 
events often show very abrupt onsets and terminations and are often punctuated 
by brief spikes of warmth and cold. Schulz, von Rad, and Erlenkeuser (1998) 
analyzed organic matter concentrations in sediment cores at oxygen minimum 
depths from the Arabian Sea deposited over the past 1 10 thousand years. The var- 
iation in organic matter deposited is thought to reflect the strength of upwelling, 
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driven by changes in the strength of the Arabian Sea monsoon. AMS 14 C dating of 
both the Arabian Sea and Santa Barbara cores gives good time control in the 
upper part of the record, and the climate proxy variation is easily fit to Greenland 
ice millennial-scale interstadial-stadial oscillations. Allen, Watts, and Huntley 
(2000} examine the pollen profiles from the laminated sediments of Lago Grande 
di Monticcio in southern Italy. Changes in the proportion of woody taxa in the 
core were dominated by large- amplitude changes near the limits of resolution of 
the data, about a century. The millennial-scale variations in this core also cor- 
relate with the Greenland record. Peterson et al. (2000) show that proxies for the 
tropical Atlantic hydrologic cycle have a strong millennial-scale signal that like- 
wise closely matches the Greenland pattern. 

Reports of proxy records apparently showing the ultimate Younger Dryas 
millennial-scale cold episode, strongly expressed in the North Atlantic records 
12,600-1 1,600 B.P., have been reported from all over the world, including south- 
ern German oxygen isotope variations (Grafenstein et ah, 1999), organic geo- 
chemistry of the Cariaco Basin, Venezuela (Werne et ah, 2000), New Zealand 
pollen (Newnham and Lowe, 2000), and California pollen (West, 2000). The 
Younger Dryas episode has received disproportionate attention because the time 
period is easily dated by 14 C and is sampled by many lake and mountain glacier 
cores too short to reach older millennial-scale events. As Cronin (1999:202-221) 
notes, the Younger Dryas is frequently detected in a diverse array of Northern 
Hemisphere climate proxies from all latitudes. The main controversy involves 
data from the Southern Hemisphere, where proxy data often do not show a cold 
period coinciding with the Younger Dryas, although some records show a similar 
Antarctic Cold Reversal just antedating the Northern Hemisphere Younger 
Dryas (Bennett, Haberle, and Lumley, 2000). 

Other records provide support for millennial-scale climate fluctuations 
during the last glacial age that cannot be convincingly correlated with the 
Greenland ice record. Cronin (1999:221-236) reviews records from the deep 
tropical Atlantic, Western North America, Florida, China, and New Zealand. 
Recent notable additions to his catalog include southern Africa (Shi et ah, 2000), 
the American Midwest (Dorale et ah, 1998), the Himalayas (Richards, Owen, 
and Rhodes, 2000), and northeastern Brazil (Behling et ah, 2000). Clapperton 
(2000) gives evidence for millennial-scale glacial advances and retreats from 
most of the American cordillera — Alaska and western North America through 
tropical America to the southern Andes. 

While the complex feedback processes operating in the atmosphere-biosphere- 
ocean system are not completely understood (Broecker 1995:241-270), plausible 
physical mechanisms could have linked temperature fluctuation in both hemi- 
spheres. For example, Broecker and Denton (1989) proposed an explanation 
based upon the effects of glacial meltwater on the deep circulation of the North 
Atlantic. Today, cold, salty water from the surface of the North Atlantic is the 
source of about half of the global ocean’s deep water. This large outflow of deep 
water currently must be balanced by an equally enormous inflow of warm surface 
and intermediate water into the high North Atlantic. If glacial meltwater lowered 
the salinity of the North Atlantic and interrupted the flow of deep water, the 
whole coupled atmosphere-ocean circulation system of the world would be 
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perturbed. Broecker and Denton's hypothesis explains how the northern and 
southern Hemisphere temperature and ice fluctuations could have been in phase 
even though the direct effects of orbital-scale variation on the two hemispheres 
are out of phase. 

Impacts of Millennial-Scale and Submillennial-Scale 
Variation on Agriculture 

We believe that high-frequency climate and weather variation would have made 
the evolution of methods for intensive exploitation of plant foods extremely 
difficult. Holocene weather extremes significantly affect agricultural production 
(Lamb, 1977). For example, the impact of the Little Ice Age (400-150 b.p.) on 
European agriculture was quite significant (Grove, 1988). The Little Ice Age 
is representative of the Holocene millennial-scale variation that is very much 
more muted than last-glacial events of similar duration. Extreme years during 
the Little Ice Age caused notable famines and such extremes would have been 
more exaggerated and more frequent during last glacial times. The United Na- 
tions Food and Agriculture Organization’s (2000) Global Information and Early 
Warning System on Food and Agriculture gives a useful qualitative sense for the 
current impacts of interannual weather variation on food production. Quanti- 
tative estimates of current crop losses due to weather variation are difficult to 
make, but reasonable estimates run 10 percent on a country- wide basis (Gommes, 
1999) and perhaps 10-40 percent on a state basis in Mexico, depending upon 
mean rainfall (Eakin, 2000). Gommes believes that weather problems account 
for half of all crop losses. 

If losses in the Holocene are this high and if high-frequency climate variation 
in the last glacial age increased at lower latitudes roughly as much as at 
Greenland, a hypothetical last-glacial farming system would face crippling losses 
in more years than not. Devastating floods, droughts, windstorms, and other 
climate extremes, which we experience once a century, might have occurred 
once a decade. In the tropics, rainfall was highly variable (Broecker, 1996). Few 
years would be suitable for good growth of any given plant population. Even 
under relatively benign Holocene conditions agriculturalists and intensive plant 
collectors have to make use of risk-management strategies to cope with yield 
variation. Winterhalder and Goland (1997) use optimal foraging analysis to ar- 
gue that the shift from foraging to agriculture would have required a substantial 
shift from minimizing risk by sharing to minimizing risk by field dispersal. Some 
ethnographically known Eastern Woodland societies that mixed farming and 
hunting, for example, the Huron, seemed not to have made this transition and to 
have suffered frequent catastrophic food shortages. Storage by intensive plant 
collectors and farmers is an excellent means of meeting seasonal shortfalls, but is 
a marginal means of coping with interannual risk, much less multiyear shortfalls 
(Belovsky, 1987:60). 3 

If Winterhalder and Goland are correct that considerable field dispersal is 
required to manage Holocene yield risks, it is hard to imagine that further field 
division would have been successful at coping with much larger amplitude fluc- 
tuations that occurred during the last glacial age. We expect that opportunism 
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was the most important strategy for managing the risks associated with plant 
foods during the last glacial age. Annual plants have dormant seed that spreads 
their risk of failure over many years, and perennials vary seed output or storage 
organ size substantially between years as weather dictates. In a highly variable 
climate, the specialization of exploitation on one or a few especially promising 
species would be highly unlikely, because “promise” in one year or even for a 
decade or two would turn to runs of years with little or no success. However, 
most years would likely be favorable for some species or another, so generalized 
plant-exploitation systems are compatible with highly variable climates. The 
acorn-reliant hunter-gatherers of California, for example, used several kinds of 
oak, gathering less favored species when more favored ones failed (Baumhoff, 
I963:table 2). Reliance on acorns demanded this generalized pattern of species 
diversification because the annual production of individual trees is highly variable 
from year to year, being correlated within species but independent between 
species (Koenig et ah, 1994}. Pleistocene hunter-gatherer systems must have been 
even more diversified, lacking the kind of commitment to a single resource cat- 
egory (acorns) observed in California. 

The evolution of intensive resource-use systems like agriculture is a rela- 
tively slow process, as we document. If ecological timescale risks could be 
managed some way, or if some regions lacked the high-frequency variation de- 
tected by the as yet few high-resolution climate proxy records, the evolution of 
sophisticated intensive strategies would still be handicapped by millennial-scale 
variation. Plant and animal populations responded to climatic change by dramat- 
ically shifting their ranges, but climate change was significant on the timescales 
shorter than those necessary for range shifts to occur. As a result, last-glacial 
natural communities must have always been in the process of chaotic reorga- 
nization as the climate varied more rapidly than they could reach equilibrium. 
The pollen record from the Mediterranean and California illustrates how much 
more dynamic plant communities were during the last glacial age (Allen et ah, 
1999; Heusser 1995). Pleistocene fossil beetle faunas change even more rapidly 
than plants because many species, especially generalist predators, change their 
ranges more rapidly than plants. Hence, they are better indicators of the eco- 
logical impacts of the abrupt, large-amplitude climate changes recorded by the 
physical climate proxies from the last glacial (Coope, 1987). 

Could the evolution of intensive plant-exploitation systems have tracked 
intense millennial- and submillennial-scale variation? Plant food-rich diets take 
considerable time to develop. Plant foods are generally low in protein and often 
high in toxins. Some time is required to work out a balanced diet rich in plant 
foods, for example, by incorporating legumes to replace part of the meat in diets. 
Whether intensification and agriculture always lead to health declines due to nu- 
tritional inadequacy is debatable, but the potential for them to do so absent 
sometimes-subtle adaptations is clear (Cohen and Armelagos, 1984; Katz, Hediger, 
and Valleroy, 1974). The seasonal round of activities has to be much modified, 
and women’s customary activities have to be given more prominence relative to 
men’s hunting. Changes in social organization either by evolution in situ or by 
borrowing tend to be slow (Bettinger and Baumhoff, 1982; North and Thomas, 
1973). We doubt that even sophisticated last-glacial hunter-gatherers would 
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have been able to solve the complex nutritional and scheduling problems asso- 
ciated with a plant-rich diet while coping with unpredictable high-amplitude 
change on timescales shorter than the equilibration time of plant migrations and 
shorter than actual Holocene trajectories of intensification. In keeping with our 
argument, the direct archaeological evidence suggests that people began to use 
intensively the technologies that underpinned agriculture only after about 
15,000 b.p. (Bettinger, 2000). 

Carbon Dioxide Limitation of Photosynthesis 

Plant productivity was also limited by lower atmospheric CO 2 during the last 
glacial. The CO 2 content of the atmosphere was about 1 90 ppm during the last 
glacial age, compared to about 250 ppm at the beginning of the Holocene [figure 
17.3). Photosynthesis on earth is C02-limited over this range of variation 
[Cowling and Sykes, 1999; Sage, 1995). Beerling and Woodward [1993; see also 
Beerling et ah, 1993) have shown that fossil leaves from the last glacial age have 
higher stomatal density, a feature that allows higher rates of gas exchange needed 
to acquire CO 2 under more limiting conditions. This higher stomatal conduc- 
tance also causes higher transpiration water losses per unit CO 2 fixed, exacer- 
bating the aridity characteristic of glacial times. Beerling (1999) estimates the 
total organic carbon stored on land as a result of photosynthesis during the Last 
Glacial Maximum using a spatially disaggregated terrestrial plant production 
model coupled to two different global climate models to provide the environ- 
mental forcing for plant growth. The model results differ substantially, one indi- 
cating a 33 percent lower, and the other a 60 percent lower, terrestrial carbon 
store at the Last Glacial Maximum compared to the Holocene. Mass-balance 
calculations based on stable isotope geochemistry also indicate a qualitatively 
large drop, but uncertainties regarding terrestrial <5 13 C lead to a similarly large 
range of estimates. Low mean productivity, along with greater variance in pro- 
ductivity, would have greatly decreased the attractiveness of plant resources 
during the last glacial age. 

Lower average rainfall and carbon dioxide during the last glacial age reduced 
the area of the earth’s surface suitable for agriculture (Beerling, 1999). Diamond 
(1997) argues that the rate of cultural evolution is more rapid when innovations 
in local areas can be shared by diffusion. Thus, a reduction in the area suitable for 
agriculture and the isolation of suitable areas from one another will have a 
tendency to reduce the rate of intensification and make the evolution of agri- 
culture less likely in any given unit of time. Since the slowest observed rates of 
intensification in the Holocene epoch failed to result in agriculture until the 
European invasions of the last few hundred years, a sufficient slowing of the rate 
of evolution of subsistence could conceivably in itself explain the failure of 
agriculture to emerge before the Holocene. A slower rate of cultural evolution 
would also tend to prevent the rapid adaptation of intensive strategies during any 
favorable locales or periods that might have existed during the last glacial. 

On present evidence we cannot determine whether aridity, low CO 2 levels, 
millennial-scale climate variability, or submillennial-scale weather variation was 
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Figure 17.3. Panel A shows the curve of atmospheric CO2 as estimated from gas 
bubbles trapped in Antarctic glacial ice. Data from Barnola et al. (1987}. Panel B 
summarizes responses of several plant species to experimental atmospheres containing 
various levels of CO2. Based on data summarized by Sage (1995}. 

the main culprit in preventing the evolution of agriculture. Low C0 2 and climate 
variation would handicap the evolution of dependence on plant foods every- 
where and were surely more significant than behavioral or technological ob- 
stacles. Hominids evolved as plant-using omnivores (Milton, 2000], and the basic 
technology for plant exploitation existed at least 10 thousand years before the 
Holocene (Bar-Yosef, 1998}. At least in favorable localities, appreciable use 
seems to have been made of plant foods, including large-seeded grasses, well 
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back into the Pleistocene (Kisley, Nadel, and Carmi, 1992], Significantly, we 
believe, the use of such technology over spans of last-glacial time that were 
sufficient for successive waves of intensification of subsistence in the Holocene 
led to only minor subsistence intensification, compared to the Mesolithic, 
Neolithic, and their ever-more-intensive successors. 

Subsistence Responses to Amelioration 

As the climate ameliorated, hunter-gatherers in several parts of the world began 
to exploit locally abundant plant resources more efficiently, but only, current 
evidence suggests, during the Bolling-Allerod period of near-interglacial warmth 
and stability. The Natufian sequence in the Levant is the best-studied and so far 
earliest example (e.g., Bar-Yosef and Valla, 1991]. One last siege of glacial cli- 
mate, the Younger Dryas from 12,900 b.p. until « 11,600 b.p., reversed these 
trends during the Late Natufian (e.g., Goring-Morris and Belfer-Cohen, 1998]. 
The Younger Dryas climate was appreciably more variable than the preceding 
Allerad-Bolling and the succeeding Holocene (Grafenstein et al., 1999; Mayewski 
et al., 1993]. The 10 abrupt, short, warm-cold cycles that punctuate the Younger 
Dryas ice record were perhaps felt as dramatic climate shifts all around the 
world. After 11,600 b.p., the Holocene period of relatively warm, wet, stable, 
C0 2 -rich environments began. Subsistence intensification and eventually agri- 
culture followed. Thus, while not perfectly instantaneous, the shift from glacial 
to Holocene climates was a very large change and took place much more rapidly 
than cultural evolution could track. 

Might we not expect agriculture to have emerged in the last interglacial 
130,000 years ago or even during one of the even older interglacials? No ar- 
chaeological evidence has come to light suggesting the presence of technologies 
that might be expected to accompany forays into intensive plant collecting or 
agriculture at this time. Anatomically modern humans may have appeared in 
Africa as early as 130,000 years ago (Klein 1999: ch. 7], but they were not 
behaviorally modern. Humans of the last interglacial were uniformly archaic in 
behavior. Very likely, then, the humans of the last interglacial were neither 
cognitively nor culturally capable of evolving agricultural subsistence. However, 
climate might also explain the lack of marked subsistence intensification during 
previous interglacials. Ice cores from the thick Antarctic ice cap at Vostok show 
that each of the last four interglacials over the last 420,000 years was charac- 
terized by a short, sharp peak of warmth, rather than the 1 1,600-year-long stable 
plateau of the Holocene (Petit et al., 1999], Further, the GRIP ice core suggests 
the last interglacial (130,000-80,000 b.p ] was more variable than the Holocene, 
although its lack of agreement with a nearby replicate core for this time period 
makes this interpretation tenuous (Johnsen et al., 1997]. On the other hand, the 
atmospheric concentration of C0 2 was higher in the three previous interglacials 
than during the Holocene and was stable at high levels for about 20,000 years 
following the warm peak during the last interglacial. The highly continental 
Vostok site unfortunately does not record the same high-frequency variation in 
the climate as most other proxy climate records, even those in the southern 
hemisphere (Steig et al., 1998]. Some northern hemisphere marine and terrestrial 
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records suggest that the last interglacial was highly variable, while other data 
suggest a Holocene-length period of stable climates ca. 127,000-117,000 b.p. 
(Frogley, Tzedakis, and Heaton, 1999). Better data on the high-frequency part of 
the Pleistocene beyond the reach of the Greenland ice cores is needed to test hy- 
potheses about events antedating the latest Pleistocene. Long marine cores from 
areas of rapid sediment accumulation are beginning to reveal the millennial-scale 
record from previous glacial-interglacial cycles (McManus, Oppo, and Cullen, 
1999). At least the last five glacials have millennial-scale variations much like the 
last glacial. The degree of fluctuations during previous interglacials is still not 
clear, but at least some proxy data suggest that the Holocene has been less 
variable than earlier interglacials (Poli, Thunell, and Rio, 2000). 


During the Holocene, Was Agriculture Compulsory 
in the Long Run? 

Once a more productive subsistence system is possible, it will, over the long run, 
replace the less-productive subsistence system that preceded it. The reason is 
simple: all else being equal, any group that can use a tract of land more efficiently 
will be able to evict residents that use it less efficiently (Boserup, 1981; Sahlins 
and Service, 1960:75-87). More productive uses support higher population 
densities, or more wealth per capita, or both. An agricultural frontier will tend to 
expand at the expense of hunter-gatherers as rising population densities on the 
farming side of the frontier motivate pioneers to invest in acquiring land from 
less-efficient users. Farmers may offer hunter-gatherers an attractive purchase 
price, a compelling idea about how to become richer through farming, or a dis- 
mal choice of flight, submission, or military defense at long odds against a more 
numerous foe. Early farmers (and other intensifiers more generally) are also 
liable to target opportunistically high-ranked game and plant resources essential 
to their less-intensive neighbors, exerting scramble competitive pressure on them 
even in the absence of aggressive measures. Thus, subsistence improvement 
generates a competitive ratchet as successively more land-efficient subsistence 
systems lead to population growth and labor intensification. Locally, hunter- 
gatherers may win some battles (e.g., in the Great Basin; Madsen, 1994), but in 
the long run the more intensive strategies will win wherever environments are 
suitable for their deployment. 

The archaeology supports this argument (Bettinger, 2000). Societies in all 
regions of the world undergo a very similar pattern of subsistence efficiency in- 
crease and population increase in the Holocene, albeit at very different rates. 
Holocene hunter-gatherers developed local equilibria that, while sometimes lasting 
for thousands of years, were almost always replaced by more intensive equilibria. 


Alternative Hypotheses Are Weak 

Aside from other forms of the climate-change hypotheses described, archae- 
ologists have proposed three prominent hypotheses — climate stress, population 
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growth, and cultural evolution — to explain the timing of agricultural origins. 
They were formulated before the nature of the Pleistocene-Holocene transition 
was understood but are still the hypotheses most widely entertained by ar- 
chaeologists (MacNeish, 1991). None of the three provides a close fit with the 
empirical evidence or to theory. 

Climate Stress Was First Too Common, Then Too Rare 

Childe (1951) proposed that terminal Pleistocene desiccation stressed forager 
populations and led to agriculture. Wright (1977) argued that Holocene climate 
amelioration brought pre-adapted plants into the Fertile Crescent areas where 
agriculture first evolved. Bar-Yosef (1998) and Moore and Hillman (1992) ar- 
gue that Late Natufian sedentary hunter-gatherers probably undertook the first 
experiments in cultivation under the pressure of the Younger Dryas climate 
deterioration. Natufian peoples lived in settled villages and exploited the wild an- 
cestors of wheat and barley beginning in the Allerod-Bolling warm period 
(14,500-12,900 b.p.) (Henry, 1989) and then reverted to mobile hunting-and- 
gathering during the sharp, short Younger Dryas climate deterioration (12,600- 
1 1,600 b.p.), the last of the high-amplitude fluctuations that were characteristic 
of the last glacial (Bar-Yosef and Meadow, 1995; Goring-Morris and Belfer- 
Cohen, 1997). Post-Natufian cultures began to domesticate the same species as 
warm and stable conditions returned after the Younger Dryas, around 1 1,600 b.p. 
Unfortunately, a flat spot in the 14 C/calendar-year calibration curve makes 
precise dating difficult for the most critical several hundred years centered on 
1 1,600 b.p. (Fiedel, 1999). As a component of an explanation of a local sequence 
of change, such hypotheses may well be correct. Yet they beg the question of 
why the 15 or so similar deteriorations and ameliorations of the last glacial age 
did not anywhere lead to agriculture or why most of the later origins of agri- 
culture occurred in the absence of Younger Dryas-scale deteriorations. Note also 
that, in principle, populations can adjust downward to lower carrying capacities 
through famine mortality even more quickly than they can grow up to higher 
ones. Such hypotheses cannot, we believe, explain the longer time- and larger 
spatial-scale problem of the absence of agriculture in the Pleistocene and its 
multiple origins and rapid spreads in the Holocene. 

The details of subsistence responses to the Younger Dryas in the areas of 
early origins of agriculture will eventually produce a sharp test of the variability 
hypothesis. We suggest that the late Natufian de-intensification in response to 
the Younger Dryas was a retreat from the trend leading to agriculture and was 
unlikely to have produced the first steps toward domestication. More likely, the 
late Natufian preserved remnants of earlier, more intensive Natufian technology 
and social organization that served to start the Levantine transition to agriculture 
at an unusually advanced stage after the Younger Dryas ended. Events in the 
Younger Dryas time period also provide an opportunity to investigate the effects 
of C0 2 concentration partly independently of climate variability. The rise in 
C0 2 concentration in the atmosphere began two to three millennia before tem- 
peratures began to rise and continued to increase steadily through the Younger 
Dryas (Sowers and Bender, 1995). The Younger Dryas period de-intensification 
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of the Natufian suggests an independent effect of millennial or submillennial 
variability. 

Population Growth Has the Wrong Timescale 

Cohen’s (1977) influential book argued that slowly accumulating global-scale 
population pressure was responsible for the eventual origins of agriculture be- 
ginning at the 11,600 B.P. time horizon. He imagines, quite plausibly, that sub- 
sistence innovation is driven by increases in population density, but, implausibly 
we believe, that a long, slow buildup of population gradually drove people to 
intensify subsistence systems to relieve shortages caused by population growth, 
eventually triggering a move to domesticates. Looked at one way, population 
pressure is just the population growth part of the competitive ratchet. However, 
this argument fails to explain why pre- agricultural hunter-gatherer intensifica- 
tion and the transition to agriculture began in numerous locations after 11,600 
years ago (Hayden, 1995). Assuming that humans were essentially modern by 
the Upper Paleolithic, they would have had 30,000 years to build up a popu- 
lation necessary to generate pressures for intensification. Given any reasonable 
estimate of the human intrinsic rate of natural increase under hunting-and- 
gathering conditions (somewhat less than 1% yr 1 to 3% yr '), populations 
substantially below carrying capacity will double in a century or less, as we will 
see in the models that follow. 


A Basic Model of Population Pressure 

Since the population explanation for agriculture and other adaptive changes 4 
connected with increased subsistence efficiency remains very popular among 
archaeologists, we take the time here to examine its weakness formally. The 
logistic equation is one simple, widely used model of the population growth. The 
rate of change of population density, N, is given by: 

f-4-l) ™ 

where r is the “intrinsic rate of natural increase” — the rate of growth of popu- 
lation density when there is no scarcity — and K is the “carrying capacity,” the 
equilibrium population density when population growth is halted by density- 
dependent checks. In the logistic equation, the level of population pressure is 
given by the ratio N/K. When this ratio is equal to zero, the population grows at 
its maximum rate; there is no population pressure. When the ratio is one, 
density dependence prevents any population growth at all. It is easy to solve this 
equation and calculate the length of time necessary to achieve any level of pop- 
ulation pressure, n = N/K. 

(2> 

where no is the initial level of population pressure. Let us very conservatively 
assume that the initial population density is only 1 percent of what could be 
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sustained with the use of simple agriculture and that the maximum rate of 
increase of human populations unconstrained by resource limitation is 1 percent 
per year. Under these assumptions, the population will reach 99 percent of the 
maximum population pressure (i.e., n = .99) in only about 920 years. Seren- 
dipitous inventions (e.g., the bow and arrow] that increase carrying capacity do 
not fundamentally alter this result. For example, only the rare single invention is 
likely to so much as double carrying capacity. If such an invention spreads within 
a population that is near its previous carrying capacity, it will still face half the 
maximum population pressure and thus significant incentive for further inno- 
vation. At an r of 1 percent, such an innovating population will again reach 99 
percent of the maximum population pressure in 459 years. 

One might think that this result is an artifact of the very simple model of 
population growth. However, it is easy to add much realism to the model 
without any change of the basic result. In Appendix 1 we show that a more 
realistic version of the logistic equation actually leads to even more rapid growth 
of population pressure. 


Allowing for Dispersal 

Once, after listening to one of us propound this argument, a skeptical archae- 
ologist replied, “But you’ve got to fill up all of Asia, first.” This understandable 
intuitive response betrays a deep misunderstanding of the timescales of expo- 
nential growth. Suppose that the initial population of anatomically modern 
humans was only about 10 4 and that the carrying capacity for hunter-gatherers is 
very optimistically 1 person per square kilometer. Given that the land area of the 
Old World is roughly 10 8 km 2 , n 0 = 10 4 /10 8 = 10 4 . Then using equation 2 and 
again assuming r= .01, Eurasia will be filled to 99 percent of carrying capacity in 
about 1,400 years. The difference between increasing population pressure by a 
factor of 100 and by a factor of 10,000 is only about 500 years! 

Moreover, this calculation seriously overestimates the amount of time that 
will pass before any segment of an expanding Eurasian population will experi- 
ence population pressure because populations will approach carrying capacity 
locally long before the entire continent is filled with people. R. A. Fisher (1937] 
analyzed the following partial differential equation that captures the interaction 
between population growth and dispersal in space: 


d N(x) 

dt 



population growth 


dispersal 


(3] 


Here N[x) is the population density at a point % in a one-dimensional environ- 
ment. Equation (3] says that the rate of change of population density in a par- 
ticular place is equal to the population growth there plus the net effect of 
random, density-independent dispersal into and out of the region. The parameter 
d measures the rate of dispersal and is equal to the standard deviation of the 
distribution of individual dispersal distances. In an environment that is large 
compared to d, a small population rapidly grows to near carrying capacity at its 
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Distance (km) 

Figure 17.4. A numerical simulation of Fisher’s equation showing that after an initial 
period, population spreads at a constant rate so that at any point in space population 
pressure increases to its maximum in less than 500 years for reasonable parameter 
values. (Redrawn from Ammerman and Cavalli-Sforza, 1984], 


initial location, and then, as shown in figure 17.4 (redrawn from Ammerman and 
Cavalli-Sforza, 1984), begins to spread in a wave-like fashion across the envi- 
ronment at a constant rate. Thus, at any given point in space, populations move 
from the absence of population pressure to high population pressure as the wave 
passes over that point. Figure 17.4 shows the pattern of spread for r= .01 and 
d ~ 30. With these quite conservative values, it takes less than 200 years for the 
wave front to pass from low population pressure to high population pressure. 
More realistic models that allow for density-dependent migration also yield a 
constant, wave-like advance of population (Murray, 1989), and although the 
rates vary, we believe that the same qualitative conclusion will hold. 


The Dynamics of Innovation 

So far we have assumed that the carrying capacity of the environment is fixed 
(save where it is increased by fortuitous inventions). However, we know that 
people respond to scarcity caused by population pressure by intensifying produc- 
tion, for example, by shifting from less labor-intensive to more labor-intensive 
foraging, or by innovations that increase the efficiency of subsistence (Boserup, 
1981). Since innovation increases carrying capacity, intuition suggests that it 
might therefore delay the onset of population pressure. However, as the model 
in Appendix 2 shows, this intuition, too, is faulty. 

Figure 17.5 shows the results of the model in Appendix 2. A small popu- 
lation initially grows rapidly. As population pressure builds, population growth 
rate slows to a steady state in which population pressure is constant, and just 
enough innovation occurs to compensate for population growth. For plausible 
parameter values the second phase of population growth steady state is reached 
in less than a thousand years. Interestingly, increasing the intrinsic rate of in- 
novation or the innovation threshold reduces the waiting time until population 
pressure is important. Innovation allows greater population increases over the 
long run, but it does not change the timescales on which population pressure 
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Figure 17.5. This plots the logarithm of population size as a function of time for the 
model described in Appendix 2. Initially, when there is little population pressure, 
population grows at a high rate. As the population grows, per capita income decreases, 
and people intensify. Eventually the population growth rate approaches a constant value 
at which the growth of intensification balances growth in population. For reasonable 
parameters (a = 0.005, r = 0.02, y m = 1 , y s = 0. 1 , y ; = 0.2, initial population size 1 percent 
of initial carrying capacity}, it takes less than 500 years to shift from the initial low 
population pressure mode of growth to the final high population pressure mode of 
growth. 


occurs. The most important factor on timescales of a millennium or greater (if 
not a century or greater, given realistic starting populations] is the rate of in- 
tensification by innovation, not population growth. 

This picture of the interaction of demography and innovation leads to 
predictions quite different from those of scholars like Cohen (1977). For ex- 
ample, we do not expect to see any systematic evidence of increased population 
pressure immediately prior to major innovations, an expectation consistent with 
the record (Hayden, 1995). If people are motivated to innovate whenever 
population pressure rises above an innovation threshold, and if, in the absence 
of successful innovation, populations adjust relatively quickly to changes in K by 
growth or contraction, then evidence of extraordinary stress — for example, 
skeletal evidence of malnutrition — is likely only when rapid environmental de- 
terioration exceeds a population’s capacity to respond via a combination of down- 
ward population adjustment and innovation. 5 Thus, for parameter values that 
seem anywhere near reasonable to us, population growth on millennial time- 
scales will be limited by rates of improvement in subsistence efficiency, not by 
the potential of populations to grow, just as Malthus argued. Populations can 
behave in non-Malthusian ways only under extreme assumptions about popu- 
lation dynamics and rates of intensification, such as the modern world in which 
the rate of innovation, but also the rate of population growth, is very high. 

Of course, in a time as variable as the Pleistocene epoch, populations may well 
have spent considerable time both far above and far below instantaneous carrying 
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capacity. If agricultural technologies were quick and easy to develop, the population- 
pressure argument would lead us to expect Pleistocene populations to shift in and 
out of agriculture and other intensive strategies as they find themselves in subsis- 
tence crises due to environmental deterioration or in periods of plenty due to ame- 
lioration. Most likely, minor intensifications and de-intensifications were standard 
operating procedure in the Pleistocene. However, the time needed to progress 
much toward plant-rich strategies was greater than the fluctuating climate allowed, 
especially given CO2- and aridity-limited plant production. 


Cultural Evolution Has the Wrong Timescale 

The timing of the origin of agriculture might possibly be explained entirely by the 
rate of intensification by innovation. For example, Braidwood (1960) argued 
that it took some time for humans to acquire enough familiarity with plant 
resources to use them as a primary source of calories, and that this “settling in” 
process limited the rate at which agriculture evolved. This proposal may explain 
the post-Pleistocene timing of the development of agriculture. However, if we 
interpret his argument to be that the settling-in process began with the evolution of 
behaviorally modern humans, the timescale is wrong again. There is no evidence 
that people were making significant progress at all toward agriculture for 30,000 
years, and Braidwood’s excavations at Jarmo show that some 4,000 years was 
enough to go from an unintensive hunting-and-gathering subsistence system to 
settled village agriculture in a fast case. Ten thousand years in the Holocene was 
ultimately sufficient for the development of plant-intensive gathering technologies 
or agriculture everywhere except in the coldest, plant-poor environments. 


The Pattern of Intensification across Cases 
Implicates Climate Change 

We have argued that Malthusian processes lead to population pressure much 
more quickly than assumed by such writers as Cohen (1977) and that the rate 
of cultural “settling in” and intensification is faster than Braidwood (1960) 
imagined, but not fast enough to intensify more than a small distance toward 
agriculture in the highly variable environments of the Pleistocene. Thus, our hy- 
pothesis that the abrupt transition from glacial to Holocene climates caused the 
origin of agriculture requires that Holocene rates of intensification be neither too 
slow nor too fast. 

Agriculture Was Independently Evolved about 10 Times 

The sample of origins is large enough to support some generalizations about the 
processes involved. Table 17.1 gives a rough time line for the origin of agricul- 
ture in seven fairly well-understood centers of domestication, two more con- 
troversial centers, three areas that acquired agriculture by diffusion, and two 
areas that were without agriculture until European conquest. 6 The list of inde- 
pendent centers is complete as far as current evidence goes, and while new 
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Table 17.1. Dates before present in calendar years of achievement of plant-intensive 
hunting and gathering and agriculture in different regions, mainly after Smith (1995) 


Region 

Intensive foraging 

Agriculture 

Centers of domestication 

Near East: Bar-Yosef and Meadow, 1995 

15,000 

11,500 

North China: An, 1991; Elston et al., 1997 

11,600 

> 9,000 

South China: An, 1991 

12,000? 

8,000 

Sub-Saharan Africa: Klein, 1993 

9,000 

4,500 

Southcentral Andes: Smith, 1995 

7,000 

5,250 

Central Mexico: Smith, 1995 

7,000 

5,750 

Eastern United States: Smith, 1995 

6,000 

5,250 

Controversial centers 

Highland New Guinea: Golson, 1977 

? 

9,000? 

Amazonia: Pearsall, 1995 

13,000? 

9,000? 

Acquisition by diffusion 

Northwestern Europe 

12,500 

7,000 

Southwestern U.S.: Cordell, 1984; Doelle, 1999 

6,000 

3,500 

Japan: Aikens and Akazawa, 1996; Crawford, 1992 

10,500 

3,000 

Never acquired agriculture 

California: Bettinger, 2000 

4,000 

n/a 

Australia: Hiscock, 1994; Smith, 1987 

3,500 

n/a 


centers are not unexpected, it is unlikely that the present list will double. Nu- 
merous areas acquired agriculture by diffusion (societies acquire most of their 
technological innovations by diffusion, not independent invention), so the three 
areas in table 17.1 are but a small sample. The number of nonarctic areas with- 
out agriculture at European contact is small and the two listed, western North 
America and Australia, are the largest and best known. 

Two lines of evidence indicate that the seven centers of domestication are 
independent. First, the domesticates taken up in each center are distinctive, and 
no evidence of domesticates from other centers turns up early in any of the 
sequences. For example, in the eastern North American center a sunflower, a 
goosefoot, marsh elder, an indigenous squash, and other local plants were taken 
into cultivation around 6,000 b.p. Mesoamerican maize subsequently appeared 
here around 2,000 b.p. but remained a minor domesticate until around 1,100 b.p., 
when it suddenly crowded out several traditional cultivars (Smith, 1989). Sec- 
ond, archaeology suggests that none of the centers had agricultural neighbors at 
the time that their initial domestications were undertaken. The two problematic 
centers, New Guinea and Lowland South America, present difficult archaeo- 
logical problems (Smith, 1995). Sites are hard to find and organic remains are 
rarely preserved. The New Guinea evidence consists of apparently human- 
constructed ditches that might have been used in controlling water for taro 
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cultivation. The absence of documented living sites associated with these fea- 
tures makes their interpretation quite difficult. The lowland South American 
evidence consists of starch grains embedded in pottery fragments and phytoliths, 
microscopic silicious structural constituents of plant cell walls. The large size of 
some early starch grains and phytoliths convinces some archaeologists that root 
crops were brought under cultivation in the Amazon Basin at very early dates. 

The timing of initiation of agriculture varies quite widely. The Near Eastern 
Neolithic is the earliest so far attested. In northern, and possibly southern, China, 
however, agriculture probably followed within a thousand years of the beginning of 
the Holocene, even though the best-documented, clearly agricultural complexes 
are still considerably later (An, 1991; Crawford, 1992; Lu, 1999). Agriculture may 
prove to be as early in northern China as in the Near East, since the earliest dated 
sites, which extend back to 8, 500 B.P., represent advanced agricultural systems that 
must have taken some time to develop. Excavations in northern China north of the 
earliest dated agricultural sites document a technological change around 11,600 
B.P., signaling a shift toward intensive plant and animal procurement that may have 
set this process in motion (Elston et al., 1997). 

The exact sequence of events also varies quite widely. For example, in the 
Near East, sedentism preceded agriculture, at least in the Levantine Natufian 
sequence, but in Mesoamerica crops seem to have been added to a hunting-and- 
gathering system that was dispersed and long remained rather mobile (MacNeish, 
1991:27-29). For example, squash seems to have been cultivated around 
10,000 b.p. in Mesoamerica, some 4,000 years before corn and bean domestication 
began to lead to the origin of a fully agricultural subsistence system (Smith, 
1997). Some mainly hunting-and-gathering societies seem to have incorporated 
small amounts of domesticated plant foods into their subsistence system without 
this leading to full-scale agriculture for a very long time. Perhaps American do- 
mesticates were long used to provide specialized resources or to increase food 
security marginally (Richard Redding, personal communication) and initially 
raised human carrying capacities relatively little, thus operating the competitive 
ratchet quite slowly. According to MacNeish, the path forward through the 
whole intensification sequence varied considerably from case to case. 

A Late Intensification of Plant Gathering Precedes Agriculture 

In all known cases, the independent centers of domestication show a late se- 
quence of intensification beginning with a shift from a hunter-gatherer subsis- 
tence system based upon low-cost resources using minimal technological aids to 
a system based upon the procurement and processing of high-cost resources, 
including small game and especially plant seeds or other labor-intensive plant 
resources, using an increasing range of chipped and ground stone tools (Hayden, 
1995). The reasons for this shift are the subject of much work among archae- 
ologists (Bettinger, 2000). The shifts at least accelerate and become widespread 
only in the latest Pleistocene or Holocene. However, a distinct tendency toward 
intensification is often suggested for the Upper Paleolithic more generally. Stiner 
et al. and commentators (2000) note that Upper Paleolithic peoples often made 
considerable use of small mammals and birds in contrast to earlier populations. 
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These species have much lower body fat than large animals, and excessive 
consumption causes ammonia buildup in the body due to limitations on the rate 
of urea synthesis [“rabbit starvation”; Cordain et ah, 2000). Consequently, any 
significant reliance on low-fat small animals implies corresponding compensation 
with plant calories, and at least a few Upper Paleolithic sites, such as the Ohalo II 
settlement on the Sea of Galilee (Kislev et ah, 1992), show considerable use of 
plant materials in Pleistocene diets. Large-seeded annual species like wild barley 
were no doubt attractive resources in the Pleistocene when present in abundance 
and would have been used opportunistically during the last glacial age. If our 
hypothesis is correct, in the last glacial age no one attractive species like wild 
barley would have been consistently abundant (or perhaps productive enough) 
for a long enough span of time in the same location to have been successfully 
targeted by an evolving strategy of intensification, even if their less intensive 
exploitation was common. The broad spectrum of species, including small game 
and plants, reflected in these cases is not per se evidence of intensification 
(specialized use of more costly but more productive resources using more labor 
and dedicated technology), as is sometimes argued (Flannery, 1971). In most 
hunter-gatherer systems, marginal diet cost and diet richness (number of species 
used) are essentially independent (Bettinger, 1994:46—47), and prey size is far 
less important in determining prey cost than either mode or context of capture 
(Bettinger, 1993:51-52; Bettinger and Baumhoff, 1983:832; Madsen and Schmitt, 
1998). For all these reasons, quantitative features of subsistence technology are a 
better index of Pleistocene resource intensification than species used. We believe 
that the dramatic increase in the quantity and range of small chipped stone and 
groundstone tools only after 15,000 b.p. signals the beginning of the pattern of 
intensification that led to agriculture. 

Early intensification of plant resource use would have tended to generate the 
same competitive ratchet as the later forms of intensification. Hunter-gatherers 
who subsidize hunting with plant-derived calories can maintain higher popula- 
tion densities and thus will tend to deplete big game to levels that cannot sustain 
hunting specialists (Winterhalder and Lu, 1997). Upper Paleolithic people ap- 
pear to be fully modern in their behavioral capacities (Klein, 1999). Important 
changes in subsistence technology did occur during the Upper Paleolithic, for 
example, the development of the atlatl. Nevertheless, modern abilities and the 
operation of the competitive ratchet drove Upper Paleolithic populations only a 
relatively small distance down the path to the kind of heavy reliance on plant 
resources that in turn set the stage for domestication. 

Braidwood’s reasoning that pioneering agriculturalists would have gained 
their intimate familiarity with proto-domesticates first as gatherers is logical and 
supported by the archaeology. Once the climate ameliorated, the rate of inten- 
sification accelerated immediately in the case of the Near East. In other regions 
changes right at the Pleistocene-Holocene transition were modest to invisible 
(Straus et al., 1996). The full working out of agrarian subsistence systems took 
thousands of years. Indeed, modern breeding programs illustrate that we are still 
working out the possibilities inherent in agricultural subsistence systems. 

The cases where Holocene intensification of plant gathering did not lead 
directly to agriculture are as interesting as the cases where it did. The Jomon of 
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Japan represents one extreme (Imamura, 1996]. Widespread use of simple pot- 
tery, a marker of well-developed agricultural subsistence in western Asia, was 
very early in the Jomon, contemporary with the latest Pleistocene Natufian in the 
Near East. By 11,000 years B.P., the Jomon people lived in settled villages, de- 
pended substantially upon plant foods, and used massive amounts of pottery. 
However, the Jomon domesticated no plants until rather late in the sequence. 
Seeds of weedy grasses are found throughout, but only in later phases (after about 
3,000 b.p.) do the first unambiguous domesticates occur, and these make up only a 
small portion of the seeds in archaeological contexts (Crawford, 1997). Sophis- 
ticated agriculture came to Japan with imported rice from the mainland only 
about 2,500 b.p. Interestingly, acorns were a major item of Jomon subsistence. 
The people of California were another group of sedentary hunter-gatherers that 
depended heavily on acorns. However, in California the transition to high plant 
dependence began much later than in the Jomon (Wohlgemuth, 1996). Milling- 
stones for grinding small seeds became important after 4,500 b.p., although seeds 
were of relatively minor importance overall. After 2,800 b.p. acorns processed 
with mortars and pestles became an important subsistence component and small 
seeds faded in comparative importance. In the latest period, after 1,200 b.p. 
quantities of small seeds were increasingly added back into the subsistence mix 
alongside acorns in a plant-dominated diet. Other peoples with a late onset of 
intensification include the Australians. The totality of cases tells us that any stage of 
the intensification sequence can be stretched or compressed by several thousand 
years but reversals are rare (Harris, 1996; Price and Gebauer, 1995). Farming did 
give way to hunting-and-gathering in the southern and eastern Great Basin of 
North America after a brief extension of farming into the region around 1,000 B.P. 
(Lindsay, 1986). A similar reversal occurred in southern Sweden between 2,400 
and 1,800 b.p. (Zvelebil, 1996). Horticultural Polynesian populations returned 
substantially to foraging for a few centuries while population densities built up on 
reaching the previously uninhabited archipelagos of Hawaii and New Zealand 
(Kirch, 1984). Had intensification on plant resources been possible during the last 
glacial age, even the slowest Holocene rates of intensification were rapid enough to 
produce highly visible archaeological evidence on the 1 0 millennium timescale, one 
third or less time than Upper Paleolithic peoples lived under glacial climates. 


More Intensive Technologies Tend to Spread 

One successful and durable agricultural origin in the last glacial age on any 
sizeable land mass would have been sufficient to produce a highly visible ar- 
chaeological record, to judge from events in the Holocene epoch. Once well- 
established agricultural systems existed in the Holocene, they expanded at the 
expense of hunting-and-gathering neighbors at appreciable rates (Bellwood, 
1996). Ammerman and Cavalli-Sforza (1984) summarize the movement of 
agriculture from the Near East to Europe, North Africa, and Asia. The spread 
into Europe is best documented. Agriculture reached the Atlantic seaboard 
about 6,000 b.p. or about 4,000 years after its origins in the Near East. The reg- 
ularity of the spread, and the degree to which it was largely a cultural diffusion 
process as opposed to a population dispersion as well, are matters of debate. 
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Cavalli-Sforza, Minozzi, and Piazza, (1994:296-299} argue that demic expan- 
sion by western Asians was an important process with the front of genes moving 
at about half the rate of agriculture. They imagine that pioneering agricultural 
populations moved into territories occupied by hunter-gatherers and inter- 
married with the preexisting population. The then-mixed population in turn 
sent agricultural pioneers still deeper into Europe. They also suppose that the 
rate of spread was fairly steady, though clearly frontiers between hunter-gath- 
erers and agriculturalists stabilized in some places (Denmark, Spain] for rela- 
tively prolonged periods. Zvelebil (1996} emphasizes the complexity and 
durability of frontiers between farmers and hunter-gatherers and the likelihood 
that in many places the diffusion of both genes and ideas about cultivation was a 
prolonged process of exchange across a comparatively stable ethnic and eco- 
nomic frontier. Further archaeological and paleogenetic investigations will no 
doubt gradually resolve these debates. Clearly, the spread process is at least 
somewhat heterogeneous. 

Other examples of the diffusion of agriculture are relatively well docu- 
mented. For example, maize domestication is dated to about 6,200 b.p. in 
Central Mexico, spreading to what is now the southwestern United States (New 
Mexico] by about 4,000 b.p. (Matson, 1999; Smith, 1995}. In this case, the 
frontier of maize agriculture stabilized for a long time, only reaching the area 
now in eastern United States at the comparatively late date noted. Maize failed 
entirely to diffuse westward into the Mediterranean parts of California even 
though peoples growing it in the more arid parts of its range in the Southwest 
used irrigation techniques that have eventually worked in California with modest 
modifications to cope with dry-season irrigation. As with the origin process, the 
rate of spread of agriculture exhibits an interesting degree of variation. 


Changes in the Cultural Evolutionary System? 

A possible alternative to our hypothesis would be that a substantial moderni- 
zation of the cultural system occurred coincidently at the end of the Pleistocene 
epoch and that this resulted in a general acceleration of rates of cultural evo- 
lution, including subsistence intensification. The modernization of culture ca- 
pacities leading up to the Upper Paleolithic transition was presumably such an 
event, as were later inventions like literacy (Donald, 1991; Klein, 1999: ch.7]. 
We are not aware of any proposals for major changes in the intrinsic rate of 
cultural evolution coincident with the Pleistocene-Holocene boundary. Students 
of the evolution of subsistence intensification and social complexity in the Ho- 
locene have suggested a series of plausible processes that will probably turn out 
to be at least part of the explanation for why the trend to intensification has 
taken such diverse forms in different regions (table 17.2}. This list of diversifying 
and rate-limiting processes does not include any that should have operated more 
stringently on Upper Paleolithic, as opposed to Mesolithic and Neolithic, socie- 
ties, climate effects aside. Holocene rates of intensification do have the right time- 
scale to be drastically affected by millennial- and submillennial-scale variation 
that is rapid with respect to observed rates of cultural evolution in the Holocene. 
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Table 17.2. Processes that may retard the rate of cultural evolution and create local 
optima that halt evolution for prolonged periods 




Authors (examples} 


Geography: Eurasia, having the largest land 
mass, has more local populations to exchange 
innovations by diffusion, hence the fastest 
Holocene rate of subsistence intensification. 

Minor climate change: The late medieval onset 
of the Little Ice Age caused the extinction of 
the Greenland Norse colony. Agriculture 
at marginal altitudes in places like the 
Andes seems to respond to Holocene 
climatic fluctuation. 

Preadapted plants: The Mediterranean Old 
World is unusually well endowed with large- 
seeded grasses susceptible to domestication 
pressures. American domesticates, especially 
maize, may outcross too much to respond 
quickly to selection. 

Diseases: Density-dependent epidemic diseases 
may evolve that slow or stop the population 
growth, pending the evolution of resistance, 
that would otherwise drive the competitive 
ratchet. Local diseases that attack 
foreigners may protect otherwise- 
vulnerable systems. 

New technological complexes evolve slowly: 

Nutritional adequacy in plant-rich 
diets requires discovering 
cooking techniques, acquiring balancing 
domesticates, developing the potential of 
animal domesticates, and the like. 

New social institutions evolve slowly: Social 
institutions are generally deeply involved in 
subsistence but are also liable to be regulated 
by norms that make adaptive evolution 
away from current local optima difficult. 

Ideology may play a role: The evolution of 

fads, fashions, and belief systems may act to drive 
cultural evolution in nonutilitarian directions 
that sometimes carry them to new adaptive slopes. 
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Crosby, 1986; Gifford- 
Gonzalez, 2000; McNeill, 
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Katz 
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Bettinger, 1999; Bettinger and 
Baumhoff, 1982; North and 
Thomas, 1973; Richerson 
and Boyd, 1999 

Weber, 1930 


If climate variation did not limit intensification during the last glacial age to 
vanishingly slow rates compared to the Holocene epoch, the failure of intensive 
systems to evolve during the tens of millennia anatomically and culturally 
modern humans lived as sophisticated hunter-gatherers before the Holocene is a 
considerable mystery. 
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Conclusion 

The large, rapid change in environment at the Pleistocene-Holocene transition 
set off the trend of subsistence intensification of which modern industrial in- 
novations are just the latest examples. If our hypothesis is correct, the reduction 
in climate variability, increase in C0 2 content of the atmosphere, and increases 
in rainfall rather abruptly changed the earth from a regime where agriculture 
was impossible everywhere to one where it was possible in many places. Since 
groups that use efficient, plant-rich subsistence systems will normally outcompete 
groups that make less efficient use of land, the Holocene has been characterized 
by a persistent, but regionally highly variable, tendency toward subsistence in- 
tensification. The diversity of trajectories taken by the various regional human 
subpopulations since « 11,600 b.p. are natural experiments that will help us elu- 
cidate the factors that control the tempo of cultural evolution and that gener- 
ate historical contingency against the steady, convergent adaptive pressure 
toward ever more intense production systems. A long list of processes (table 
17.2) interacted to regulate the nearly unidirectional trajectory of subsis- 
tence intensification, population growth, and institutional change that the 
world’s societies have followed in the Holocene. Notably, even the slowest 
evolving regions generated quite appreciable and archaeologically visible inten- 
sification, demanding some explanation for why similar trajectories are absent in 
the Pleistocene. 

Those who are familiar with the Pleistocene epoch often remark that the 
Holocene is just the “present interglacial.” The return of climate variation on the 
scale that characterized the last glacial age is quite likely if current ideas about 
the Milankovich driving forces of the Pleistocene are correct. Sustaining agri- 
culture under conditions of much higher amplitude, high-frequency environ- 
mental variation than farmers currently cope with would be a considerable 
technical challenge. At the very best, lower C0 2 concentrations and lower av- 
erage precipitation suggest that world average agricultural output would fall 
considerably. 

Current anthropogenic global warming via greenhouse gases might at least 
temporarily prevent any return to glacial conditions. However, we understand 
the feedbacks regulating the climate system too poorly to have any confidence in 
such an effect. Current increases in C0 2 threaten to elevate world temperatures 
to levels that in past interglacials apparently triggered a large feedback effect 
producing a relatively rapid decline toward glacial conditions (Petit et ah, 1999). 
The Arctic Ocean ice pack is currently thinning very rapidly (Kerr, 1999). A 
dark, open Arctic Ocean would dramatically increase the summer heat income 
at high northern latitudes and have large, difficult-to-guess impacts on the 
Earth’s climate system. No one can yet estimate the risks we are taking of a rapid 
return to colder, drier, more variable environment with less C0 2 or evaluate 
exactly the threat such conditions imply for the continuation of agricultural 
production. Nevertheless, the intrinsic instability of the Pleistocene climate 
system, and the degree to which agriculture is likely dependent upon the Ho- 
locene stable period, should give one pause (Broecker, 1997). 
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APPENDIX i: More Realistic Population Dynamics 

The logistic equation assumes that an increment to population density has the same 
effect on population pressure at low densities as at high densities. We know that this 
assumption is not correct in all cases. For example, hunters pursuing herd animals 
may generate much population pressure at low human population densities because 
killing only a small fraction of the herd makes the many survivors difficult to hunt. 
On the other hand, subsistence farmers spreading into a uniform fertile plain may 
feel little population pressure until all farmland is occupied. If returns to additional 
labor on shrinking farms then drop steeply, most population pressure will be felt at 
densities near K. To allow for such effects, ecologists often utilize a generalized lo- 
gistic equation: 



Population pressure is now given by the term (N/K) e . If 9 > 1, population pressure 
does not increase until densities approach carrying capacity, as is usually the case for 
species like humans that have flexible behavior and considerable mobility, and thus 
can mitigate the effects of increasing population density over some range of densities. 
It seems intuitive that this would increase the length of time necessary to reach a 
given level of population pressure. However, this intuition is wrong. The generalized 
logistic can be used to derive a differential equation for n = (N/K)°: 

dn = e_(N) B dN 
dt N\K J dt 

Msn>- 

=0<i-o 

Thus, the differential equation for population pressure is always the ordinary logistic 
equation in which K = 1 and r is multiplied by 0. This means that when 0 > 1, it takes 
less time to reach a given amount of population pressure than would be the case if 
0=1. Reduced population pressure at low densities leads to more rapid initial pop- 
ulation growth. Population growth is close to exponential longer and this more than 
compensates for the fact that higher densities have to be reached to achieve the same 
level of population pressure. 


[ N] e ) (A1.2) 

IkJ J 


APPENDIX 2: The Dynamics of Innovation 


Consider a population of size N in which the per capita income of the population is 
given by: 


y=, 


y m l 

I + N 


(A2.1) 


where y m is the maximum per capita income, and 7 is a variable that represents the 
productivity of subsistence technology. Thus, per capita income declines as popu- 
lation size increases, but for a given population size, greater productivity raises per 
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capita income. As in the previous models, we assume that as population pressure, 
now measured as falling per capita income, increases, population growth decreases. 
In particular, assume: 

^=pN[y-yJ (A2.2) 

where y s is the per capita income necessary for subsistence. If per capita incomes are 
above this value, population increases; if per capita income falls below y s , population 
shrinks. If I is fixed, this equation is another generalization of the logistic equation. In 
an initially empty environment, population initially grows at a rate p[y m — ys), but 
then slows and reaches an equilibrium population size: 


Kvm-ys) 

Vs 


(A2.3) 


To allow for intensification we assume that people innovate whenever their per 
capita income falls below a threshold value y t . Thus: 


d J t =aJ( yi -y) (A2.4) 

When per capita income is less than the threshold value y it people innovate, in- 
creasing the carrying capacity and therefore decreasing population pressure. When 
per capita income is greater than the threshold, they “de-innovate.” This may seem 
odd at first, but such abandonment of more efficient technology has been observed 
occasionally. The maximum rate at which innovation can occur is governed by the 
parameter a. 

If a small pioneer population enters an empty habitat, it experiences two distinct 
phases of expansion (figure 17.5). Initially, per capita income is near the maximum, 
and population grows at the maximum rate. As population density increases, per 
capita income drops below y„ and the population begins to innovate, eventually 
reaching a steady state value: 


y= 


pys + m 

p + a 


(A2.5) 


The steady state per capita income is above the minimum for subsistence but below 
the threshold at which people experience population pressure and begin to innovate. 
At this steady state population growth continues at a constant rate, 


- aiVi-Vs) 
p + a 

that is proportional to the rate of growth of subsistence efficiency. 


(A2.6) 


NOTES 

We thank Joe Andrew, Ofer Bar-Yosef, Richard Redding, Bruce Winterhalder, and 
three anonymous referees for unusually constructive criticism of the manuscript. 
Thanks to Scott Elias for insights pertaining to Pleistocene seasonality and to Peter 
Ditlevsen for providing figure 17.2. Peter Lindert’s invitation to give a seminar led to 
the first draff of this chapter. Thanks to Francisco Gil- White for assistance with the 
Spanish abstract in the original article. 

1 . We define “efficiency” as the productivity per unit area of land exploited for 
subsistence. Efficiency of subsistence is favored by strategies that move subsistence 
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down the food chain, especially to high-productivity plant resources. “Intensifica- 
tion” we define as the use of human labor to add productive lower-ranked resources 
to the diet or the use of technological innovations to increase the rank of more 
productive resources. Typically both strategies are employed simultaneously. Since 
increases in efficiency are achieved by either labor or technical intensification and 
since increases in efficiency usually also lead to population growth, we use the term 
“intensification” loosely for the interlinked processes of labor and technical inten- 
sification and population growth. We define “agriculture” as dependence upon do- 
mesticated crops and animals for subsistence. We mark the origin of agriculture as 
the first horizon in which plant remains having anatomical markers of domestication 
are found, or are likely on other grounds to be found in the future. Fully agricultural 
subsistence systems in the sense of a dominance of domesticated species in the diet 
typically postdate the origin of agriculture by a millennium or more. 

2. It has also been argued that Pleistocene climates were less seasonally variable 
than during the Holocene, but this idea has scant empirical support (Miracle and 
O’Brien, 1998). Elias (1999) has used fossil beetle faunas to estimate July and Jan- 
uary temperatures in Holocene and Pleistocene deposits. These data suggest that the 
Pleistocene was wore seasonal than the Holocene. However, beetle estimates of 
January temperatures are not very reliable because beetles in temperate and arctic 
climates overwinter in a dormant state so that their distributions are rather insensi- 
tive to winter as opposed to summer temperatures. Plant distributions are similarly 
affected. No current method of estimating winter temperatures in the Pleistocene is 
reliable. 

3. Agronomists and climatologists have recently become interested in the im- 
pacts of climate change and climate variability in the context of CC>2-indcued global 
warming (Bazzaz and Sombroek, 1996; Downing, Olsthoorn, and Tal, 1999; Kane 
and Yohe, 2000; Reilly and Schimmelpfennig, 2000; Rosensweig and Hillel, 1998; 
Schneider, Easterling, and Mearns, 2000). Global climate models suggest that global 
warming may increase short timescale climate variation as well as creating a steep 
trend. To some degree, these conditions mimic the millennial and submillennial scale 
variations in the Pleistocene, and, as crop-and-weather models and empirical data 
improve, more definitive assessments of impact of last glacial conditions on plant- 
based subsistence strategies will become possible. 

4. By “adaptive,” we mean behaviors that, by comparison with available al- 
ternatives, have the largest population mean fitness. 

5. Some human populations might have curtailed birth rates in order to pre- 
serve higher incomes at any given level of intensification. In a sense, such populations 
have just redefined K to be a lower value that permits higher incomes by employing 
what Malthus called the “preventative checks” on population growth. The rest of the 
analysis then applies with K measured in suitably emic terms. Cultural differences in 
the value of intensification threshold or K (Coale, 1986) will make evidence of stress 
more likely in populations where the effective carrying capacity is closer to the 
ultimate subsistence carrying capacity than in populations that reduce growth rates 
by preventative checks that keep population well below absolute subsistence limits. 
The perceived costs of population control, given that the main mechanism in non- 
modern societies was infanticide and sexual abstinence, may mean that most popu- 
lations intensified labor inputs at any given level of technology efficiency to near 
subsistence limits (Hayden, 1981). In either event, population pressure will tend to 
stay constant to the extent that rates of population growth and intensification are suc- 
cessful in adjusting subsistence to current conditions. Normally population growth 
and decline are quite rapid processes relative to rates of innovation and will keep 
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average population size quite close to K. Short-term departures from K caused by 
short-term environmental shocks and windfalls should be the commonest reasons to 
see especially stressed or unstressed populations. If the rate of innovation is more 
rapid than exponential population growth for any significant time period, then per 
capita incomes can rise under a regime of very rapid population growth, as in the last 
few centuries. This regime, if it had occurred in the past, should be quite visible in 
the historical and archaeological record because it so rapidly leads to large popula- 
tions and large-scale creation of durable artifacts. Alternatively, population growth 
may have been limited in past populations by the analog of the modern demographic 
transition. Thus, hunter-gatherers might have resisted the adoption of plant-based 
intensification because they viewed the life style associated with plant collecting or 
planting as a decrement to their incomes. However, resisting intensifications that 
increase human densities makes such groups vulnerable to competitive displacement 
by the intensifiers unless the greater wealth of the population limiters allows them to 
successfully defend their resource-rich territories. On the evidence of the fairly rapid 
rate of spread of intensified strategies once invented, such defense is seldom suc- 
cessful (e.g., Ammerman and Cavalli-Sforza, 1984; Bettinger and Baumhoff, 1982). 

6. The dates in table 17.1 reflect considerable recent revision stemming from 
accelerator mass spectrometry 14 C dating, which permits the use of very small carbon 
samples and can be applied directly to carbonized seeds and other plant parts 
showing morphological changes associated with domestication. Isolated seeds tend to 
work their way deep into archaeological deposits, and dates based on associated large 
carbon samples (usually charcoal) often gave anomalously early dates. 
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PART 5 

Links to Other Disciplines 


Biology is an immense enterprise whose purview ranges from the 
physics of enzyme catalysis to the role of gene expression in cell differentiation 
to the evolutionary origins of flight to the global carbon cycle. Nonetheless, 
biology is a single discipline that is taught as a coherent, integrated subject to 
first-year university students. By contrast, each social science has its own in- 
dependent introductory course, one that usually makes little reference to 
other disciplines. The rigid division of human sciences into disciplines has 
always seemed quite odd to us. In the great scheme of things, humans surely 
present a smaller range of phenomena than all the rest of biology. One reason 
why biology remains a unified discipline is that the science has a small set 
of unifying problems at its core. Physics and chemistry underpin everything. 
Genetics, cell metabolism, ecology, and evolution are relevant to all organ- 
isms, and physiology is common to all multicellular life. A good basic biology 
course will show how these integrating subdisciplines relate to one another. 
Practicing biologists often discover that they need to know something of each 
of these integrating subjects in their professional careers. How can the human 
sciences possibly be very different? 

We have no clear idea of why the human sciences have evolved so dif- 
ferently from biology. Our mentor Donald T. Campbell took an interest in 
such matters (Campbell, 1969] and supposed that the social sciences would 
become much more interdisciplinary than they in fact have. In this part, we 
argue that evolutionary theory, specifically the theory of cultural evolution, 
stands ready to play much the same role that organic evolution does in biology. 
The basic argument is very simple. What is the most dramatic feature of 
human life? Certainly one candidate is its dramatic variation in time and space. 
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No other species changes its behavior so rapidly, and none occupies such a 
wide range of environments using such a wide range of economic strategies. 
Evolutionary processes produce this diversity; every culture has descended 
from some immediate ancestor, ultimately tracing back to a common African 
ancestor. Every discipline in the human sciences is centrally concerned with 
cultural evolution and cultural diversity, whether called by these names or not. 
Anthropologists have made the study of cultural diversity their specialty. 
Historians study cultural change in all its forms. For economists, the evolution 
(or growth] of economies is a central theme. Political scientists study opinion, 
policy, and constitutional change; sociologists, institutional change. Cultural 
evolutionists have something to say about some central topics, such as the 
explanation of human cooperation and social institutions (parts 2 and 3 
contain examples]. Should human scientists care to emphasize unifying 
problems, cultural evolution can share a portion of the burden. 

Chapter 1 8 shows economists how a theory of cultural evolution quite 
naturally complements the rational choice theory that is basic to their disci- 
pline. Rational choice theory is one of the other candidates to be a major 
unifying element in the human sciences. Yet rational choice theory famously 
lacks psychological realism (Simon, 1959] and lacks an explicit temporal 
dimension (Nelson and Winter, 1982]. Here we derive the basic Darwinian 
theory of cultural evolution from Bayesian assumptions applicable to the 
standard rational actor. The behaviors of others are merely a form of proxy 
information about the world, a resource to be tapped in deciding how to be- 
have oneself. In a world where gaining information tends to be costly, imitating 
what others do is an excellent strategy under a wide variety of circumstances. 
An inheritance system provides time-tested information. Using your parents’ 
beliefs or those of others as Bayesian priors is highly adaptive. Doing so allows 
an individual to concentrate scarce resources on updating decent priors rather 
than on starting with less information-rich priors, such as those furnished by 
a generic human nature. Adding these bits of psychological realism yields a 
theory of cultural evolution within which boundedly rational actors play a 
fundamental role. The theory also accounts for important human oddities such 
as our extraordinary cooperation and our susceptibility to certain types of 
maladaptations. Neat as one of Adam Smith’s pins we thought, and still think, 
though the manuscript was rejected by the American Economic Review after 
protracted adventures with editors and reviewers. We suspect the baleful in- 
fluence of the lack of a synthetic first-year course is at work here. Subjects not 
legitimated in that course, which purports to encompass all someone needs to 
know, are deeply suspect, and culture is generally absent from Econ 1 . At the 
same time, an economic anthropologist who taught us a lot about the science 
of culture knew little of what is taught in Econ 1 . 

Chapter 19 is directed at those in the social sciences unfamiliar with a 
style of deploying mathematical models that is second nature to economists, 
evolutionary biologists, engineers, and others. Much science in many dis- 
ciplines consists of a toolkit of very simple mathematical models. To many not 
familiar with the subtle art of the simple model, such formal exercises have 
two seemingly deadly flaws. First, they are not easy to follow. The modern 
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style of mathematical analysis uses a very compact notation that facilitates 
algebra but is quite hard to read. Even the initiated reader might take days to 
deeply understand even a rather elementary model. The untrained are nearly 
helpless. Second, motivation to follow the math is often wanting because the 
model is so cartoonishly simple relative to the real world being analyzed. 
Critics often level the charge "reductionism” with what they take to be 
devastating effect. The modeler's reply is that these two criticisms actually 
point in opposite directions and sum to nothing. True, the model is quite 
simple relative to reality, but even so, the analysis is difficult. The real lesson 
is that complex phenomena like culture require a humble approach. 

We have to bite off tiny bits of reality to analyze and build up a more global 
knowledge step by patient step. Experimentalists know the same lesson. To 
achieve virtues of experimental control of variables, you have to examine only 
one or a few variables at a time. Similarly, observational studies must examine 
a relatively few dimensions if any explanatory power is to result. Simple 
models, simple experiments, and simple observational programs are the best 
the human mind can do in the face of the awesome complexity of nature. The 
alternatives to simple models are either complex models or verbal descriptions 
and analysis. Complex models are sometimes useful for their predictive 
power, but they have the vice of being difficult or impossible to understand. 
The heuristic value of simple models in schooling our intuition about natural 
processes is exceedingly important, even when their predictive power is lim- 
ited. (The predictive power of complex models is no better; they often sac- 
rifice much transparency for little improvement in predictive power.] Verbal 
reasoning is exceedingly important because the human mind seems to be a 
verbal organ. However, words alone can be snares and delusions. Unaided 
verbal reasoning can be unreliable — words are polysemic, and the phenomena 
of the world have quantitative dimensions poorly captured by the qualita- 
tive concepts of natural language. The lesson, we think, is that all serious 
students of human behavior need to know enough math to at least appreciate 
the contributions simple mathematical models make to the understanding 
of complex phenomena. The idea that social scientists need less math than 
biologists or other natural scientists is completely mistaken. 

Chapter 20 deals with the vexatious concept of memes. On the one hand, 
we have great sympathy with the views of the ''universal” Darwinists like 
Daniel Dennett, Robert Aunger, and Susan Blackmore, who, following 
Richard Dawkins, employ the term to stress the analogies between genes and 
culture. On the other hand, we have several worries. One is academic 
punctilio. When Dawkins (1976] coined the term meme, he quite frankly 
admitted that he had done no scholarship in the social sciences. Fair enough in 
the context of a trade book, but, in fact, another pioneering universal 
Darwinist, Donald Campbell (1965, 1975], had done significant work on 
cultural evolution by 1976. Lucca Cavalli-Sforza and Marc Feldman (1973] 
had already published their pioneering formal models of cultural evolution. 
The second, more substantive problem is that the analogy between genes and 
culture is not very deep. The two are similar in that important information is 
transmitted between individuals. Both systems create patterns of heritable 
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variation, which in turn implies that the population-level properties of both 
systems are important. Population-level properties require broadly Darwinian 
methods for analysis. But this just about exhausts the similarities. The list of 
differences is much larger. Culture is not based on direct replication but upon 
teaching and imitation. The transmission of culture is temporally extended. It 
is not necessarily particulate. Psychological processes have a direct impact on 
what is transmitted and remembered. These psychological effects can produce 
complex adaptations in the absence of natural selection. Users of the meme 
concept seem to us to believe that it does more work than it really does. Third, 
most users of the meme concept follow Dawkins in being rather incurious 
about the existing scholarship on the 

nature of cultural transmission. A large amount of data already exists on how 
culture works as an inheritance system and as an evolutionary system. 
Linguists are perhaps the most advanced students of memes (e.g., Bloom, 
2000}. Building upon such existing scholarship is surely the most effective way 
to make progress. Other domains of culture — social organization, technology, 
folk science — may be governed by rather different principles. The job of 
synthesizing what we already know and drawing lessons for future work is left 
undone to the extent that we think that the analogy with genes is a sufficient 
foundation for a science of culture. It isn't. 

We believe that the Darwinian theory of cultural evolution will make 
contributions across the broad sweep of problems in the human sciences, but 
the project is one of introducing additional useful tools and unifying concepts 
rather than an imperial ambition to replace great swaths of existing theory or 
methods. 
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l8 Rationality, Imitation, 
and Tradition 


When the quality of information is poor, people often rely on 
tradition in making economic decisions. What is the best retail markup per- 
centage? When should one refinance one’s home? What is the right safety factor 
in designing a building? Retailers, homeowners, and engineers typically make 
such decisions using traditionally acquired rules-of- thumb. This tactic has both 
advantages and disadvantages. It can be useful because solving problems from 
scratch is difficult and costly. On the other hand, the uncritical adoption of 
traditional solutions to problems can lead people to acquire outmoded or even 
completely unfounded beliefs. Peasants sometimes resist beneficial innovations 
proffered by development agencies and retain traditional agricultural practices; 
many contemporary Americans maintain the unfounded belief that there are 
innate differences between the members of different ethnic groups. 

The fact that tradition is sometimes reliable and other times misleading 
creates an interesting problem for economists. Traditions often work; when they 
do, they are useful because they reduce the costs of acquiring information and 
lower the possibility of making errors. However, if everyone were to depend 
exclusively on traditional rules, what would cause traditional rules to be modi- 
fied in response to changes in the environment, and what would initially cause 
useful and reliable behaviors to become traditions? 

Conventional economic theory is not helpful in answering this question 
(Conlisk, 1980). Economists have adopted the Bayesian theory of rational choice 
as the natural extension of the utility-maximizing view of human behavior when 
there is uncertainty and use it as a positive theory to predict people’s behavior 
in a wide variety of contexts (Hirshleifer and Riley, 1978). Within the context of 
this theory, a person’s beliefs about the world are represented as a subjective 
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probability distribution. Once this distribution is specified, the theory tells us 
how rational people should behave and how they should modify their beliefs in 
accord with their experience. The theory does not tell us why people initially 
come to have the beliefs that they do but simply takes them as given. 

The role of traditional knowledge has been discussed by some economists, 
but the processes that lead to sensible traditions seem to have been largely 
ignored. Hayek (1978) believes that limited knowledge and cognitive abilities 
force people to rely on traditional beliefs and values and argues that traditions are 
sensible because groups with favorable traditions survive longer and attract more 
members. Proponents of evolutionary models of firms (Alchian, 1950; Nelson 
and Winter, 1982) assume that beliefs, values, and other determinants of firm 
behavior are transmitted within firms and that these beliefs are shaped by the 
natural selection of firms. The only formal theoretical treatment of tradition 
seems to be the interesting article of Conlisk (1980) in which the individuals 
who optimize compete with individuals who acquire their behavior by imitation. 
If optimization is costly, Conlisk shows that imitation can persist in the popu- 
lation. 

In this chapter, we introduce tradition into conventional theory by assuming 
that people acquire their initial subjective probabilities by imitating their par- 
ents, relatives, teachers, business associates, and friends, but otherwise behave as 
classical Bayesian rationalists. Several lines of empirical evidence support the 
assumption that people acquire their beliefs about the world by imitation and 
similar processes. Psychologists have shown that children readily acquire be- 
havioral traits from moral beliefs to rules of grammar by imitating adult models 
(Bandura, 1977; Rosenthal and Zimmerman, 1978). Data collected on familial 
resemblances show high parent-offspring correlations for a wide variety of cog- 
nitive traits (I.Q.; Scarr and Weinberg, 1976), behaviors (child abuse, alco- 
holism; Smith, 1975), and indicators of beliefs (religious and political-party 
affiliation; Fuller and Thompson, 1960). A wealth of anthropological data sug- 
gests that human groups possess considerable cultural inertia; members of 
groups with different cultural histories behave quite differently even when living 
in similar environments (e.g., Edgerton, 1971). There is also evidence that in- 
dividuals acquire new beliefs by imitation when they enter organizations such as 
business firms (Van Maanen and Schein, 1979) and that this process causes 
distinct cultures to develop in different organizations. (This body of evidence is 
reviewed in more detail in Boyd and Richerson, 1985:38-60.) 

The assumption that people acquire their beliefs by imitation leads to 
models that keep track of the processes that change the frequency of alternative 
beliefs in a population of decision makers. To understand why a particular 
person acquires a particular set of beliefs, we must know to what kinds of be- 
havior naive individuals are exposed. This in turn will depend on the distribution 
of beliefs (and thus behaviors) that exist in the population. A person in a village 
in which many people have adopted modern farming practices is more likely to 
acquire the beliefs that underlie such practices than a person exposed only 
to traditional lifeways. To predict the distribution of beliefs in the population at 
some future time, we must know the present distribution of beliefs and account 
for all of the processes that change that distribution through time. Here we 
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present several such models of cultural change. For a more extensive exposition 
of our views, see Boyd and Richerson (1985), and for related work, see Pulliam 
and Dunford (1980), Cavalli-Sforza and Feldman (1981), Lumsden and Wilson 
(1981), and Rogers (1989). 

These models are different from Conlisk’s in two important ways: (1) 
Conlisk regards imitation as an alternative to optimization; individuals are either 
imitators or optimizers. We assume that imitation is a precondition for opti- 
mization; everyone must acquire beliefs about the world before they can opti- 
mize. (2) Conlisk simply posits dynamical relations between variables that 
describe a whole population of decision makers; we are more concerned to show 
how the details of individual imitation and decision-making processes lead to the 
dynamics of the distribution of beliefs in a population through time. As we shall 
see, the optimal behavior in these models is usually for individuals to mix imi- 
tation and individual decision making, depending on how the temporal dynamics 
work out. 

We think that there are three lessons to be drawn from our theory of tra- 
ditions: first, there are plausible circumstances in which it is optimal to depend 
nearly completely on tradition at equilibrium. Second, there are plausible ge- 
netic and cultural mechanisms that could cause people to achieve this equilib- 
rium. Third, when people do depend largely on tradition, processes other than 
individual choice may have important effects on why people behave the way 
they do. We will begin by modeling a reference case in which people acquire 
their initial subjective probabilities by imitation and then modify them in ac- 
cordance with their own experience in a uniform and constant environment. 
This model indicates that when beliefs are transmitted culturally, greater reli- 
ance on tradition always leads to higher expected utility. We will then add 
environmental variability to the model. When the optimal behavior varies be- 
cause individuals encounter different environments, there is an optimal level of 
dependence on tradition. If there is a substantial chance that individuals and the 
people that they imitate experience the same environment, and if the infor- 
mation available to update priors is poor, it can be an evolutionary equilibrium 
to rely almost completely on tradition. In the simplest model, a population of 
such individuals will, on the average, behave almost as if they were perfect- 
information optimizers. However, in such a population other processes, which 
can lead to both beneficial (but poorly understood) beliefs or deleterious su- 
perstitions, may also be important. Finally, we will argue that there are cultural 
processes that may cause people to be characterized by an optimal reliance on 
tradition. 


The Basic Model 

In the first and simplest model there are only two processes that affect the dis- 
tribution of beliefs in a population of decision makers. First, individuals use 
available information to update their subjective probability distributions. Second, 
the frequency of different beliefs is changed by the transmission of these beliefs 
to another generation. The model has three parts: a description of how single 
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individuals modify their beliefs in light of their experience (a process we refer to 
as “individual learning”), a consideration of how individual learning affects the 
distribution of beliefs in a population of individuals, and a mechanism for passing 
one generation’s beliefs to the next. 

Consider the following very simple decision problem. An individual decision 
maker has the following utility function: 

«(y,*)= -Mz-yf (i) 

where z is a decision variable under his control, y is a variable that represents the 
state of the world, and uo is a constant. While the quadratic form of this utility 
function is unconventional in the theory of the consumer, it is a mathematically 
convenient representation of the usual view of individual choice. To see this, 
consider the following example: suppose that the decision maker is a young 
professional just beginning his or her career and that z represents the amount of 
time devoted to career advancement. The remainder of the young professional’s 
time, t, is devoted to family and recreation. Then t and z are arguments of 
a personal “production function,” which gives amounts of various “commodi- 
ties,” for example, income and marital happiness, produced for each combina- 
tion of t and z. The consumption of these commodities in turn generates utility. 
By using the constraint that total time is fixed and assuming that the young 
professional’s personal production and utility functions have the appropriate 
convexity properties, one could derive a unimodal function giving utility as a 
function of z. The optimum value of this function, y, would depend on the prop- 
erties of the personal production function, which in turn will depend on the 
state of the world. For example, the relationship between time devoted to work 
and income might depend on what kind of firm the young professional has 
entered. While the utility function so derived is unlikely to be exactly quadratic, 
this functional form is a reasonable caricature of a more general unimodal 
function. In fact, one could think of it as the first two terms of a Taylor’s series 
expansion of an arbitrary utility function in the neighborhood of the optimum. 
Because we have not specified how commodities map onto utilities, this model 
can represent any degree of risk preference. 

The individual does not know the value of y with certainty, but his or her 
beliefs about the likelihood that y takes on various values conform to a normal 
probability distribution with mean y and variance L. Note that y is not a random 
variable; in a given environment there is an optimum amount of time devoted to 
career. The probability distribution describes the decision maker’s subjective 
beliefs about what value of z is optimum. 

Before making his or her choice, the decision maker has the opportunity to 
review a certain amount of evidence about the state of the world. For example, 
by observing the effects of time devoted to work on career advancement and 
home life, the young professional could get an estimate of the optimal amount 
of time to devote to work. Because our young professional’s initial rate of ad- 
vancement and domestic satisfaction might depend on a variety of factors other 
than the amount of time devoted to work, this estimate will be imperfect. 
Suppose that this evidence can be quantified by the variable x. The decision 
maker believes [correctly) that the value of x is normally distributed with mean y 
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and variance V e . After using this evidence and Bayes’s law, the decision maker's 
updated subjective probability distribution is normal with mean 3/ where 


_ V e y + Lx 
y ~ V e + L 


(?) 


To simplify the development here, assume that the decision maker does not 
update the variance of his or her subjective probability distribution. 

The decision maker uses the updated distribution to calculate his or her 
expected utility as a function of z: 


E{u(z,yW,x}= — Mo [(z — yf + L\ [3) 

and, thus, the value of z that maximizes his or her expected utility, z* is the 
following: 

z'-y' (4) 

That is, the optimal behavior is the individual’s posterior estimate of the most 
likely state of the environment. 

Now, suppose that there is a large population of decision makers. The in- 
dividuals who make up this population differ in only two respects: (1) they have 
different prior beliefs about the most likely state of the world, and (2) they are 
exposed to different evidence about the state of the world. To formalize the first 
assumption, we assume that the frequency distribution of y in the population 
before the subjective probability distributions have been updated, Q t (y), is 
normal with mean M t and variance B t . Notice that this is a description of the 
population, not a probability density. To formalize the second assumption, we as- 
sume that the value of % experienced by each different individual is an inde- 
pendent random variable with the density p(y), which has a mean equal to the 
true state of the world, y, and variance V e . Otherwise, all individuals are iden- 
tical; in particular, they all have the same utility function and their subjective 
probability distribution is characterized by the same value of L. 

Let us now consider how the use of Bayes’s law by individuals to modify 
their beliefs changes the frequency distribution of y in the population. The 
distribution of y in the population of decision makers after updating, Q', is as 
follows: 


QjCy)- JJ Ky\y,x)Q t {y)p{x)dydx 


(5) 


where /i(y|y', x) is the conditional density of an individual’s belief after updating, 
given that the individual had beliefs characterized by 3/ before updating and 
observed x. Then Q'(y) is normal with this mean: 


and variance: 


m; = 


M t V e + yL 
V e + L 


( 6 ) 


D , _ B t Vj + V e L 2 
1 ~ ( Ve + L ) 2 


( 7 ) 
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Thus, after updating, the mean value of y moves closer to the correct value, y; 
the variance may either increase or decrease depending on the magnitudes of B„ 
V„ and L. 

So far, we have followed the usual practice of taking the decision maker’s 
initial subjective probabilities as given. We are now in a position to consider the 
effect of the transmission of these beliefs to another “generation” of decision 
makers by imitation. For example, suppose that the young professionals advance 
in their firm and are eventually replaced by a new cohort of entry-level profes- 
sionals, who form a new population of decision makers and face the same de- 
cision problem that their predecessors faced. Initially the individuals in this 
second “generation” are naive; they have no beliefs of any kind about how much 
time should be devoted to work. However, each naive individual has been able 
to observe n models of behavior of the previous generation of professionals. 
Based on the behavior of their models, naive individuals are able to infer what 
each model believes about how much time should be devoted to one’s profes- 
sion. Then each of the naive individuals adopts the mean of the n inferred values 
of y that characterize their models as the mean of their own subjective proba- 
bility distributions. We assume that the variance, L, remains constant at the 
same value as in the previous generation. 

With these assumptions the distribution of y in the population just before 
updating in generation, t + 1, Qt+i(y}, is normal with mean, M t+ \ = M' t , and 
variance, B t+ 1 = (1 /n)B' t . Because the distribution of y remains normal, the state 
of the population of decision makers at any time can be specified by the mean and 
variance of y. If the environment remains constant, the values of the mean and 
variance in the population will eventually reach a unique stable equilibrium, M 
and B, where 

M=y (8] 


V e L 2 

n(V e + if - V 2 


( 9 ] 


Equations (6] and (8] say that the effect of the repeated application of Bayesian 
inference and accurate imitation on the mean value of y is unambiguous: the 
average of the best guesses about the state of the environment in the population 
converges monotonically to the actual state of the environment. According to (7] 
and (9], however, the variance of y is affected by competing processes. New 
variation is introduced each generation by errors in individual learning; this 
process acts to increase B. On average, however, inference causes beliefs about 
the environment to become more accurate, and this decreases B. Finally, if n > 1, 
imitation itself acts to decrease the variance of B in the population. 


The Evolutionary Stable Amount of Tradition 

The relative importance of tradition and individual learning is determined by the 
relative magnitudes of the width of each individual’s initial prior probability 
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distribution (L) and the quality of the information available to individuals [V e ). 
If L is small compared to V e , young professionals’ work habits will be mostly 
determined by the beliefs that they acquire by imitation. If L is large, the in- 
formation that individuals gather for themselves will be more important. 

In this section we determine the evolutionary stable, or ESS, value of L. To 
do this, we find the value of L that when common in a population has higher 
expected utility than slightly different values of L. One way to justify the ESS 
approach is to assume that L is a genetically variable character and that utility is 
monotonically related to fitness. The ESS value of L is the value that prevents the 
rare genotypes from invading under the influence of natural selection. Some 
models of cultural transmission have very similar properties to genetic ones, and 
for our immediate purposes, we can think of L as evolving under the influence of 
either process. Clearly, cultural and genetic transmission also differ in important 
ways, for example, in the timescale over which they are relevant. Variations in 
reliance on tradition among contemporary societies likely require a cultural 
explanation, while a genetic model would be appropriate for studying the evo- 
lution of humans from apes. The penultimate section of the chapter will address 
several explicitly cultural mechanisms that can lead to the ESS. 

Consider a population in which most individuals have a learning rule 
characterized by the parameter value, L, and that has reached the associated 
equilibrium values M and B. The expected utility of an individual whose learning 
rule is characterized by parameter L' is the following: 

£My,%)}= - U 0 {v ^ L]2 [(y = Mf + B] + [10) 

One can show that this expression for expected utility is concave with a global 
maximum at the value of L, iJ, 

tf = [y-M) 2 +B [11) 

The term (y — M) 2 + B measures the closeness of the population’s beliefs about 
the state of the world to its actual state; V e measures the accuracy of the in- 
formation gained by each individual through his own experience. Relation [11) 
[together with [1]) says individuals should rely on imitation in proportion to the 
accuracy of the distribution of beliefs. If [y — M) 2 + B is large compared to V ei 
individuals should rely mainly on their own experience; if [y — M) 2 + B is small 
compared to V e , then it is optimal to depend mainly on imitation. This ex- 
pression does not depend on the assumption that the population is in equilib- 
rium nor that the environment is constant. 

Now, suppose that natural selection, or an analogous cultural process, favors 
L, which increases expected utility. Then because £ is a function of L, the pop- 
ulation will eventually reach an ESS value of L, L*, such that L* = B(L*). Using 
the expression for B given in equation [9), one can show that the ESS amount of 
imitation is L* = 0. At equilibrium, individuals will depend completely on tra- 
dition and totally disregard the evidence presented by one’s own experience. 

This result has an intuitive explanation. At equilibrium, the relative merit of 
tradition and learning depends on the relative “noisiness” of the two sources of 
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information. Learning has two effects on the variance in the population. On 
average, learning causes individual's estimates of y to move toward the correct 
value and thus acts to reduce the variation in the population. However, errors 
made during learning increase the variation of the population. Once the popu- 
lation reaches equilibrium in a constant environment, the net effect of learning is 
to maintain erroneous beliefs in the population. Decreasing L always decreases 
B. Thus, any process that acts to change L so as to increase expected utility will 
reduce L until experience plays no role in determining individual beliefs. 


Heterogeneous Environments 

There are good reasons to doubt the robustness of the conclusion of the previous 
section. So far, we have assumed that (1} every member of the population 
experienced the same state of the world, (2) the state of the world did not vary 
from generation to generation, and (3} all individuals had the same utility 
function. Relaxing any one of these assumptions reduces the usefulness of tra- 
dition. For example, consider a heterogeneous environment in which different 
individuals experience different states of the world, but in which there is some 
chance that individuals in one environment draw models from other environ- 
ments. In a given environment, people's beliefs will tend toward the optimum 
in that environment, but drawing models from diverse environments will reduce 
the likelihood that an individual acquires beliefs that are appropriate to its 
own environment. The models in this section show that a substantial reliance on 
tradition may still be evolutionarily stable in a heterogeneous environment or in 
a population in which utility functions vary. We have shown elsewhere that this 
conclusion also holds true in an environment that changes through time (Boyd 
and Richerson, 1983, 1985: ch. 4). 

The essential feature of a heterogeneous environment is that different in- 
dividuals in the population experience different states of the world, formalized 
in terms of the value of y. Such variation might arise for many reasons. For 
example, different young professionals might work in different firms, practice 
different professions, or live in different regions. We will model heterogeneous 
environments by assuming that the probability that an individual in the popu- 
lation experiences the environment specified by the value y is given by a normal 
density function, f(y), with mean 0 and variance H. Setting the mean to 0 can be 
done without loss of generality since it sets only the origin from which different 
environments are measured. The variance, H, is a measure of the amount of 
environmental variation. 

Suppose that in the environment characterized by the value y, the frequency 
of individuals with a subjective probability distribution characterized by a mean 
y before updating is normal with mean M f (y] and variance B t (y). Then the mean 
and variance after updating in that environment are given by equations (6) and 
(7] with the appropriate value of y. Further, suppose that there is a probability 
1 —m that given models experience the same environment that their naive 
imitators will experience and a probability, m, that models are drawn at random 
from the population as a whole. Thus, for example, some of a particular young 
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professional’s models might be drawn from another firm in which more (or less) 
dedication is required to succeed. This model also applies to a population of 
individuals who five in a uniform environment but whose utility functions have 
different optima. 

With these assumptions, one can derive recursions for the mean and vari- 
ance of the distribution of prior beliefs in each environment. One can show that 
the equilibrium mean in habitat y is shown here: 


M(y) = 


(1 - m)yL 
mV e + L 


( 12 ) 


Equation (12) says that in a heterogeneous environment on average individuals 
have incorrect beliefs about their environment. The mean value of y in any 
environment y results from the balance of two forces. The Bayesian learning 
process tends to move the mean toward the correct value for that environment, 
but the exposure to models drawn from other environments moves the mean 
toward the mean for the entire population, 0. To find the equilibrium variance, 
we proceed exactly as in the previous section. 

By averaging the expressions for the equilibrium mean and variance over all 
habitats, and using the expression for the ESS value of L given by equation (1 1), 
one can calculate L* in a heterogeneous environment. The results of this cal- 
culation are shown in figure 18.1, which plots the relative importance of imi- 
tation in determining behavior, V e /(L* + V e ), as a function of V e for several 



Figure 18.1. Plot of the fractional importance of tradition in determining behavior 
when the propensity to rely on tradition is at its equilibrium value, V e /(L* + V e J, 
as a function of the quality of information available to individuals (V e ) assuming a 
heterogeneous environment, n = 1 and H= 1.0. Increasing values of m represent 
increasing amounts of mixing of models among different environments. 


NKS TO OTHER DISCIPLINES 


values of m. This figure indicates that the equilibrium optimum amount of 
imitation increases as the quality of the information available through individual 
experience declines and as the probability that models are drawn from foreign 
environments decreases. 

These results make sense. The amount of imitation favored by evolutionary 
processes depends on the relative quality of two sources of information, the 
information available to individuals through their own experience and through 
observing the behavior of their models. As V e increases, the quality of the in- 
formation available to individuals through experience declines. As m decreases, 
the probability that an individual’s models will exhibit behavior that is appro- 
priate in the local environment increases. Thus, both increasing V e and de- 
creasing m cause the equilibrium value of L to increase. 

These results suggest that the conclusions of the first section are not entirely 
misleading. When the amount of mixing between environments is not too large 
and information is of low quality, individuals achieve the highest expected utility 
by relying mainly on tradition. We think that this combination of circumstances is 
not uncommon. The world is complicated and poorly understood and the effects 
of many decisions are experienced over the course of a lifetime. In deciding how 
much time to devote to their families, young professionals must estimate not only 
the immediate effect on their careers and homelives but also the long-run effects 
on the development of their children’s adolescent behavior. In such cases the 
information available to individuals may be very poor indeed, and it is plausible 
that they are best off relying almost entirely on traditional beliefs. Also notice that 
figure 18. 1 is a worst case for tradition because it assumes that there is only one 
model (n=l). As n increases, the equilibrium variance within environments 
decreases, and, therefore, tradition is relatively more reliable. 

It is important to note that even when the amount of individual learning is 
small, it plays an important role in the evolutionary dynamics of the population. 
Some individual learning is necessary if traditional beliefs are to remain utilitarian 
in local environments in the face of imitation of experienced individuals from 
other environments. However, a relatively small amount of individual learning is 
sufficient to keep traditional behaviors on average reasonably near utilitarian 
optima, so long as mixing between heterogeneous environments is not too great. 


Biased Imitation 

To this point, we have assumed that individuals adopt a simple unbiased average 
of the beliefs of the models to which they are exposed. This may not be the most 
sensible procedure. It would seem better to preferentially imitate models whose 
behavior has been successful. Young professionals might imitate models who 
are particularly accomplished in their work and content in their private lives. 
More generally, naive individuals may imitate prosperous models, contented 
models, prestigious models, or devout models. By doing this, naive individuals 
will be more likely to acquire beliefs that lead to prosperity, devotion, content- 
ment, or prestige. In this section we show how this form of biased cultural 
transmission can increase the frequency of correct beliefs in a population, even 
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when individuals do not understand the causal connection between beliefs and 
their consequences. 

Suppose that instead of simply averaging the beliefs of their models, naive 
individuals weight models according to their utility-models achieving higher 
utility having a greater influence on a naive individual’s initial belief than in- 
dividuals with lower utility. There are many plausible observable correlates of 
utility, such as level of consumption. It seems likely that by imitating individuals 
with higher levels of consumption, naive individuals might increase their chances 
of acquiring beliefs that lead to higher utility. In particular, suppose that the 
initial value of y acquired by a naive individual exposed to models with the 
utilities U\, . . . , u n , and beliefs yi, • ■ ■ , y n , is this expression: 


- EL1M1 + K) 
y +b«o 


[ 13 ) 


where b is a positive constant small enough that terms of order b 2 can be ignored. 

With this assumption, it can be shown (Boyd and Richerson, 1985) the 
mean in the population after transmission is shown here: 


M t + 1 = M' t + (1 - 1 / n)B' t E (Reg[y,w(y)]} (14) 

where £{Reg(y,w(y)} is the regression of utility on y averaged over all possible 
sets of models. According to equation (14), the change in the mean due to biased 
transmission depends on two factors: the amount of variability within sets of 
models [(1 — l/n)B' t ] and the extent to which beliefs about the world are pre- 
dictably related to utility [£{Reg[y,u(y)]}]- Variability within sets of models is 
important because biased transmission is a culling process that works because 
some models are more attractive than others. If all models are identical, biased 
transmission can have no effect. The regression of utility on y is a measure of the 
average effect of a change in an individual’s beliefs on his or her utility. If it is 
positive, individuals with larger values of y will have higher utility and, therefore, 
be more likely to be imitated. This will cause the mean value of y in the popu- 
lation to increase. Both the sign and the magnitude of E{Reg[y,w(y)]} depend on 
the distribution of y in the population. If M t is less than the optimum value (y), 
larger values of y will on average lead to higher utility, and the regression will be 
positive. The reverse will occur if M, < y. This means that biased transmission 
will leave the mean unchanged only if it is at the optimal value. 

Biased transmission is of interest because it can explain the existence of 
“folk wisdom,” beneficial but poorly understood customs. The preferential 
imitation of successful people will tend to increase beliefs and practices that lead 
to success; there is no need for individuals to understand the causal connection 
between traditional practice and success, even on the part of the individuals who 
invent the practices. 


Natural Selection 

So far we have assumed that the probability that a naive individual is exposed to 
models who are characterized by given beliefs (i.e., a given value ofy) is equal to 
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the frequency of that kind of individual in the previous generation. There is good 
reason to suppose that this assumption is often violated. For example, the 
probability that young professionals are advanced in their firm is likely to depend 
on how much time they devote to work. Underachievers are likely to be fired 
and overachievers to be promoted. Thus, models who are available for imitation 
within a firm may represent a biased sample of the original population. More 
generally, if the behaviors that are shaped by the beliefs acquired by imitation are 
important, they may affect many aspects of individuals’ lives: whom they meet, 
how long they live, how many children they have, or whether they get tenure. 
All of these factors could affect the probability that an individual becomes 
available as a model for others. This means that individuals characterized by 
some values of y will end up being more likely to be imitated than individuals 
with other values. All other things being equal, it is intuitive that this process, 
which we will term “natural selection” because of its close resemblance to the 
biological process, will increase the frequency of the variants most likely to 
“survive” to enter the pool of models. For a more extensive discussion of the 
natural selection of culturally transmitted behaviors, see Boyd and Richerson 
(1985:173-203], 

To formalize this idea, we suppose that the probability that an individual 
who chooses behavior z becomes available as a model, W[z), is the following: 

W{z) = exp{ - (z - wf/2K] (1 5] 


where w is behavior that maximizes the probability of being in the model pool 
and 1/K is a measure of the intensity of the selection process. Note w need not 
equal y; for example, individuals who devote more than the utility maximizing 
amount of time to their work may be more likely to be promoted within the 
firm. 

Using [15] one can show that the mean value of y in the population of 
models (after selection], M", in this equation: 


M" = 


M[K + wB\ 
B' t + K 


(16] 


Thus, selection moves the mean value of y in the population toward the value 
that maximizes the probability of entering the pool of models, w. One can also 
show that it reduces the variance of y in the population. The strength of both 
these effects is proportional to the variance in y in the population and the 
intensity of the selection process. 

Natural selection is important because it explains how a reliance on tradition 
can lead to erroneous or deleterious beliefs. Many social and economic processes 
affect the kinds of individuals available as models. Some of these processes act on 
the level of the individual, as in the case of the young professional. Others affect 
whole firms or institutions. For example, firms composed of overachievers may 
be more likely to survive and expand than firms composed of utility maximizers. 
When culturally acquired beliefs are important in determining people’s behav- 
ior, these selective processes will affect what kinds of people are available for 
imitation and therefore what beliefs will characterize the population. Since there 
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is no reason to believe that such selective processes always favor utility maxi- 
mizing behavior, selection may cause the most common beliefs in a population 
to be deleterious. Nonetheless, if information is imperfect and costly to acquire, 
it may still be sensible to rely on tradition; a modest systematic error may be 
preferable to a larger random error. 

As an aside, we could also interpret the case of a naive manager being so- 
cialized by overachievers as the acquisition of a new utility function by consid- 
ering that preferences are transmitted by tradition and modified by evolutionary 
processes such as selection. Such a model would allow a more general account of 
the relationship between learning and tradition than the Bayesian framework 
used here permits in order to reflect other models of the decision-making process 
[e.g., Nelson and Winter’s, 1982, evolving “routines”). To enlarge on these prob- 
lems is, however, outside the scope of this chapter. Here we want to emphasize 
that the standard, and normatively appropriate, Bayesian model is incomplete 
without a theory of tradition. 


Cultural Mechanisms Leading to the ESS Amount of Imitation 

So far we have assumed that natural selection acting on genetic variation or an 
analogous cultural process causes the value of L to change in the direction of in- 
creasing expected utility. In this section we consider such cultural processes 
in more detail. Suppose that the relative dependence on tradition versus one's 
own experience itself is a culturally transmitted trait. Then each of the three 
mechanisms we have just studied can, under the right circumstances, act like 
natural selection to change L in the direction that increases expected utility. 

First, however, it is important to clarify why, within the context of the 
model outlined so far, it is not possible for individuals to choose directly the 
appropriate value of L. An essential assumption of this chapter is that the in- 
formation available to individuals is limited; they know the results of their own 
direct experience and the observable behavior of the individuals whom they had 
available to imitate, but they do not know the optimum behavior, y. From equa- 
tion (1 1), the optimal amount of imitation is given by the term B + (M — y) 1 2 . 
Individuals can estimate B and M from their sample of models, and under some 
circumstances this information might be sensibly used to modify L. They cannot 
choose the optimum value of L, however, because that value depends on how 
close the mean belief in the population is to the optimum, y. 

How do people acquire their attitudes toward tradition? Assume that 
people acquire their value of L by imitation during an earlier episode of social 
learning. With this assumption, any of the processes that change the frequency 
of a culturally transmitted trait could affect the evolution of the mean value of L 
in the population: 

1 . Ordinary learning. Individuals might acquire an initial value of L by 

imitation or teaching and then modify it in accordance with their 
experience. For example, during enculturation, individuals must ac- 
quire many different beliefs and behaviors. They might experiment 
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with different values of L during early episodes of learning, re- 
taining the value that seems to yield the best results. This process 
would change the mean value of L among members of the popu- 
lation in the direction that increased average utility. 

2. Biased transmission. Suppose that available models are variable, some 
of them relying on tradition to a greater degree than others. More- 
over, suppose that naive individuals can observe some behavior of 
their models that serves as a useful index of the model’s utility. Then 
if naive individuals are predisposed to imitate successful models, the 
mean value of L in the population will move toward the optimum. 
Notice that this can be true even if, as we have assumed, individuals 
have no understanding of why certain beliefs lead to higher utilities. 

3. Natural selection. Once again assume that individuals vary in their 
attitudes toward tradition. Individuals with different values of L 
will, on average, behave differently. If an individual’s behavior 
affects the probability that he or she becomes a model, natural 
selection will change the mean value of L in the direction that 
increases the chance of acquiring behaviors that make an individual 
likely to become a model. To the extent that there is a correlation 
between the utility associated with a behavior and the probability 
that an individual with the same behavior will become a model, 
natural selection would modify L in a utility maximizing direction. 

To see how these processes might work, consider how attitudes toward tradition 
might change as a society undergoes industrialization. It is often thought that in 
pre-industrial agricultural societies people rely heavily on tradition. If one sup- 
poses that in such societies information is costly, then their reliance on tradition 
is sensible according to our model. Now, suppose that during industrialization, 
technical and institutional change makes information less costly. According to 
the model, people would be better off if they relied more on their own expe- 
rience and less on tradition. This might come about by any of the three processes 
mentioned. To some extent, individuals might have been able to infer from their 
own experience that a lower reliance on tradition improved their lot. More 
plausibly, during industrialization people with a tendency to rely more on their 
own experience and less on traditional beliefs might more readily acquire non- 
traditional skills that lead to wealth and other kinds of observable markers of 
success. If successful individuals are more likely to be imitated, biased trans- 
mission would decrease average reliance on tradition. Or less traditional in- 
dividuals might simply be more successful at becoming teachers, managers, and 
bureaucrats in modernizing societies. The natural selection mechanism could 
have favored a reduced dependence on tradition through differential achieve- 
ment of roles that are important in socialization. 

Invoking processes that affect earlier episodes of imitation to understand the 
nature of a subsequent episode clearly creates a problem of explanatory regress. 
Each of the three processes mentioned depends on some aspect of the imitation 
process, which then must be explained. In the case of ordinary learning, in- 
dividuals must have some way of weighting the importance of the value of L that 
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they acquired by imitation against the value that their experience indicates is 
best. Do they rely on their experience or on imitation? In the case of biased 
transmission, individuals must have some criteria of success — do they imitate 
wealthy individuals? Content individuals? Even natural selection will differ in its 
effects depending on whom naive individuals are prone to imitate. Are they 
disproportionately affected by their parents, or are other individuals important? 

Ultimately, these are questions about human nature. The answers must be 
sought in the long-run processes that govern the interactions of cultural and 
genetic evolution in our species. This topic has been discussed at length by us 
(Boyd and Richerson, 1985] and others (Pulliam and Dunford, 1980; Lumsden 
and Wilson, 1981; Durham, 1978]. Our work supports two generalizations that 
are relevant here: 

1 . If there is genetic variation that affects the tendency of people to 
imitate, natural selection will tend to modify this tendency so that it 
maximizes genetic fitness. Thus, to the extent that people prefer 
fitness-enhancing outcomes, selection would increase average utility. 

2. There are a variety of conditions in which the fitness-maximizing 
values of L are near 1. Thus, it is plausible that even the earliest 
episodes of imitation are not directly subject to genetic influences. 


Discussion 

The economic theory of rational choice under uncertainty is incomplete because 
it is silent about the source of people's initial beliefs about the world. People are 
not immortal; sometime between birth and adulthood they acquire a set of 
beliefs about the world. Because rational behavior, including the rational re- 
sponse to new information, depends on the nature of an individual’s prior beliefs, 
virtually any behavior can be rational, and therefore explicable, given some set of 
prior beliefs. A peasant’s initial resistance to a beneficial innovation is explicable 
if one supposes that he believes that traditional ways are superior to modern 
ones. His ultimate rejection of modern practices may also be rational if his beliefs 
are described by “tight” priors. 

In this chapter we have extended the economic theory of choice under un- 
certainty by assuming that individuals acquire their initial subjective probability 
distribution by imitation. In particular, we supposed that each naive individual 
observes the behavior of a number of experienced models sampled from a larger 
population, induces the belief that led to the observed behavior, and then adopts 
an average of those beliefs as his own initial beliefs. Then to understand why 
people acquire the initial beliefs that they do, we must understand why the 
population is characterized by a particular distribution of beliefs. This means that 
models that allow for imitation must account for all of the processes that will arise 
from individual learning and decision making, while others result from social and 
economic processes that have different effects on people with different beliefs. 

This amendment to economic theory is not proposed as a behavioral alter- 
native to the usual assumption that people are rational optimizers. Whether they 
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are optimizers or not, mortal individuals must acquire their initial beliefs from 
others. It well may be that the particular model of imitation we have chosen is 
incorrect, that Bayesian optimizing is a poor model of how humans make 
choices, or that genetic inheritance is important in determining people’s be- 
havioral predispositions. In any case, we believe that a complete theory of hu- 
man behavior would have a similar structure to the models outlined here; it 
would keep track of the dynamics of a population of decision makers by ac- 
counting for the processes that change the distribution of beliefs or other pre- 
dispositions in the population. Some of these processes will result from people’s 
attempts at improving their lot, while others will result from what happens to 
them because they hold the beliefs that they do. 

There are two lessons that can be drawn from the models presented here: 
first, they suggest that a strong reliance on tradition may indeed be sensible. At 
equilibrium, individuals may rely almost entirely on traditional knowledge and 
ignore any other information that may be available to them. When (I) the 
quality of information available to individuals is low and improving it is costly, 
[2] there is a good chance that the individuals’ models experienced the same 
environment that they experience. Traditional solutions to problems may be 
much closer to the optimal behavior, on the average, than the solutions that 
individuals could devise on their own. 

The theory also suggests, however, that when traditions are substantially 
more important in determining people’s beliefs than their own experience, a 
variety of processes other than individual learning may affect the commonness of 
different beliefs. When tradition is important, it acts like a system of inheritance 
to create heritable variation within and among groups. Processes like biased 
transmission and natural selection can then affect the frequency of different 
beliefs by making it more likely that some beliefs will be transmitted from one 
generation to the next. When the effect of individual experience is small, it is 
plausible that such processes may have an important effect on the way that 
people behave. 

Some of these processes, such as biased transmission, may increase the 
frequency of utility-enhancing behaviors. This fact is of interest because it may 
explain “folk wisdom,” that is, the fact that people hold beneficial traditions that 
they do not understand. The most striking examples of folk wisdom come from 
anthropological research. For example, in many parts of the New World native 
peoples treated maize as a strong base to produce foods such as hominy or masa 
as part of their traditional cuisine. Katz, Hediger, and Valleroy [1974) have 
shown that such treatment makes more of the amino acid lysine available [lysine 
is the least plentiful amino acid in maize). They have also shown that there was a 
strong negative correlation between the use of alkali treatment and the avail- 
ability of protein from sources other than maize. Given that many factors in- 
fluence nutrition, and that only small, uncontrolled samples were available, it is 
difficult to see how individuals in these cultures could have detected the effect of 
the treatment. Indeed, although Africans have been using maize as a staple for a 
few centuries, alkali cooking has not yet developed there. It seems more likely 
that it could spread because eating treated maize made people more successful 
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or more likely to survive and, therefore, more likely to be imitated. Folk wis- 
dom also plays a role in economic thinking. Hayek (1978} argues that tradi- 
tional beliefs and institutional arrangements reflect wisdom beyond the ken of 
any individual, and he bases many political and economic prescriptions on this 
view. Similarly, proponents of an evolutionary view of the firm (e.g., Alchian, 
1950; Nelson and Winter, 1982} argue that inherited decision rules that deter- 
mine a firm’s response to market conditions may be sensible in ways that nobody 
in the firm understands. 

However, for other processes that affect the frequency of alternative beliefs 
in a population, such as natural selection, there is no guarantee that utility- 
maximizing behaviors will be favored. This may explain the existence of behavior 
that seems paradoxical under the usual assumption of individual rationality. In 
our example of natural selection on behaviors transmitted in the workplace, 
people could come to work harder than they would desire. Such behaviors could 
remain in a population because on average the traditions transmitted within a 
firm are more useful than alternative behaviors individuals could acquire by their 
own efforts. In other words, a reliance on tradition causes individuals to trade 
systematically suboptimal behaviors transmitted within the firm for the ran- 
domly suboptimal ones that can be discovered by individual effort. Elsewhere we 
show that processes other than natural selection can have this general effect 
(Boyd and Richerson, 1985}. 

Finally, models of the kind described here may also be useful in clarifying the 
relationship between human evolution and contemporary human behavior. 
Hirshleifer (1977} has argued that one of the attractive features of sociobio- 
logical theory is that it provides an independent way to derive utility functions; 
namely, human preferences have been shaped by natural selection so that, at 
least in the context of a hunter-gatherer society, they enhanced genetic fitness. 
While we are sympathetic to this general approach, we have argued (Boyd and 
Richerson, 1985} that many human preferences are difficult to explain on this 
basis. For example, many contemporary professionals seem to sacrifice genetic 
fitness by delaying marriage, reducing family size, and limiting time devoted to 
child care in order to gain professional success. Such behavior is explicable, 
however, if one imagines that individuals who value professional accomplish- 
ment for its own sake are more likely to rise to positions of influence than those 
with more “sociobiological” values. To take another example, humans cooper- 
ate in large groups of unrelated individuals to provide public goods (such as 
victory in warfare} in a way that seems difficult to reconcile with individual 
fitness maximization. In the work cited, we have shown how some forms of 
cultural transmission, permitting selection on culture at the level of groups, can 
arise from attempt to use traditions to enhance the ends of genetic fitness. To 
take advantage of the economies of information acquisition that tradition offers 
requires a measure of blind trust of traditional wisdom. Such weak rational 
control on tradition by its users may be sensible but at the same time allows 
culture to respond to blind evolutionary processes unique to the cultural system 
of inheritance. These processes may ultimately have important effects on what 
individuals prefer as well as on what they believe. 
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We thank Robert Brandon, John Conlisk, Jack Hirshleifer, Richard Nelson, Eric 
A. Smith, John Staddon, Robert Seyfarth, Joan Silk, Michael Wade, and John Wiley 
for providing comments on an earlier version of this chapter; we also thank John 
Gillespie and Ron Pullman for crucial insights about modeling environmental varia- 
tion and learning, respectively. As tradition dictates, we stipulate that any errors are 
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1 5 Simple Models of Complex 
Phenomena 

The Case of Cultural Evolution 


A great deal of the progress in evolutionary biology has resulted 
from the deployment of relatively simple theoretical models. Staddon’s, Smith's, 
and Maynard Smith’s contributions illustrate this point. Despite their success, 
simple models have been subjected to a steady stream of criticism. The com- 
plexity of real social and biological phenomena is compared to the toylike quality 
of the simple models used to analyze them and their users charged with un- 
warranted reductionism or plain simplemindedness. 

This critique is intuitively appealing — complex phenomena would seem to 
require complex theories to understand them — but misleading. In this chapter 
we argue that the study of complex, diverse phenomena like organic evolution 
requires complex, multilevel theories but that such theories are best built from 
toolkits made up of a diverse collection of simple models. Because individual 
models in the toolkit are designed to provide insight into only selected aspects 
of the more complex whole, they are necessarily incomplete. Nevertheless, stu- 
dents of complex phenomena aim for a reasonably complete theory by studying 
many related simple models. The neo-Darwinian theory of evolution provides 
a good example: fitness-optimizing models, one and multiple locus genetic mod- 
els, and quantitative genetic models all emphasize certain details of the evolu- 
tionary process at the expense of others. While any given model is simple, the 
theory as a whole is much more comprehensive than any one of them. 

Our argument is not very original; the conscious use of the strategy of using 
simple models to study complex phenomena goes back at least as far as Weber's 
(1949) use of “ideal types” to study human societies. Good modern expositions 
include those by Levins (1966, 1968), Liebenstein (1976), Wimsatt (1980), and 
Quinn and Dunham (1983). If we can contribute anything useful to the case for 
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simple models, it is because our work has involved extending standard evolu- 
tionary theory to a particularly troublesome complexity, cultural inheritance of 
humans (and in rudimentary form, of some other organisms). This work makes a 
variety of uses of starkly simple evolutionary models, including models based on 
the assumption of fitness optimization. Yet one of our concerns has been to 
determine the conditions under which fitness optimization models will fail to 
account for human behavior. Perhaps we have acquired a self-conscious aware- 
ness of some of the tactical details of the simple-model strategy that will be of 
some use to others. 


The Complexity and Diversity of Evolutionary Processes 

Evolutionary processes are both extremely complex and extremely diverse. On 
this count, those who are skeptical of simple models are certainly on solid 
ground. Every evolving population has a complex history in which many pro- 
cesses have contributed to its evolution, including perhaps drift, migration, 
mutation, and many other things besides selection. Further, each of these pro- 
cesses can be broken down into a series of interacting subprocesses, each en- 
compassing many varieties. Take selection. There is selection on genes with 
large effects, selection on quantitative characters, selection on correlated char- 
acters and pleiotropic genes, frequency- and density-dependent selection, se- 
lection on sex-limited and sex-linked characters, sexual selection of a couple 
of kinds, and so on. Aside from viruses, all organisms have an intimidatingly 
large number of interacting genes and phenotypic characters. Environments 
vary in space and time with large effects on migration and selection. Age, sex, 
and social organization structure populations and affect their response to evo- 
lutionary processes. Developmental processes are complex, although poorly 
understood, and perhaps affect evolution in fundamentally important ways. Or- 
ganisms affect their environments as they evolve. In the case of cultural evolu- 
tion, additional complexities are introduced. We must understand the details of 
how individuals acquire and modify attitudes and beliefs, how different attitudes 
and beliefs interact with genes and environment to produce behavior, and how 
behavior and environment interact to produce consequences for individual lives. 
Obviously, the study of evolutionary processes must somehow cope with this 
complexity. 

Evolutionary processes are diverse because different populations are quite 
different from one another in terms of their biology and the environments to 
which they are and have been exposed. Discoveries about the concatenation of 
processes affecting the evolution of one population or species do not necessarily 
say very much about those in others. In the case of cultural evolution, the details 
of the cultural transmission process vary appreciably from culture to culture. In 
some, fathers are more important in childhood socialization; in others, less. 
Modern societies depend on formal teachers; in traditional societies members 
of the extended family are often important, and so on. Our models of cultural 
evolution suggest that such structural differences can be quite important to 
understanding what cultural traits might evolve. 
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Culture and the Evolutionary Process 

In this section, in order to provide a body of detailed examples for use in the later 
sections, we shall sketch some theoretical results from our own work on the com- 
plexities in the evolutionary process caused by culture. Other kinds of complexities 
of the evolutionary process could be used instead, but we know this one best. 

In the last few years, a number of scholars have attempted to understand the 
processes of cultural evolution in Darwinian terms. Social scientists (Campbell, 
1965, 1975; Cloak, 1975; Durham, 1976; Ruyle, 1973] have argued that the 
analogy between genetic and cultural transmission is the best basis for a general 
theory of culture. Several biologists have considered how culturally transmitted 
behavior fits into the framework of neo-Darwinism (Pulliam and Dunford, 1980; 
Lumsden and Wilson, 1981; Boyd and Richerson, 1983a,b], Other biologists and 
psychologists have used the formal similarities between genetic and cultural 
transmission to develop theory describing the dynamics of cultural transmission 
(Cavalli-Sforza and Feldman, 1973, 1981; Cloninger, Rice, and Reich, 1979; 
Eaves et al., 1978). 

The idea that unifies all this work is that social learning or cultural trans- 
mission can be modeled as a system of inheritance; to understand the macro- 
scopic patterns of cultural change we must understand the microscopic processes 
that increase the frequency of some culturally transmitted variants and reduce 
the frequency of others. Put another way, to understand cultural evolution we 
must account for all of the processes by which cultural variation is transmitted 
and modified. This is the essence of the Darwinian approach to evolution. We 
(Boyd and Richerson, 1985) have been particularly interested in the question of 
the origin of cultural transmission. Under what circumstances might selection on 
genes favor the existence of a second system of inheritance based on the principle 
of the inheritance of acquired variation? 

Cultural and genetic transmission are similar in some respects. For example, 
the skills and dispositions transmitted during enculturation of children by par- 
ents create patterns of behavior that are very difficult to distinguish empirically 
from patterns resulting from genetic influences. 

In other respects, cultural and genetic transmission differ sharply. First, 
culture is transmitted by an individual observing the behavior of others or by the 
naive being taught by the experienced. This means that behavior modified by 
trial-and-error learning can subsequently be transmitted; culture is a system for 
the inheritance of acquired variation. Second, patterns of cultural transmission 
are quite different from patterns of genetic transmission. Models other than 
biological parents are often imitated, including peers, grandparents, and so forth. 
The cultural analogues of generation length and the mating system are different 
from, and more variable than, the genetic case. Finally, the naive individual ac- 
quiring an item of culture is a more or less active decision-making participant in 
the transmission process. To some extent, we choose what traits we learn from 
others, but a zygote cannot choose its genes. 

The goal of the Darwinian approach to cultural evolution is to understand 
cultural change in terms of the forces that act on cultural variation as individuals 
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acquire cultural traits, use the acquired information to guide behavior, and act as 
models for others. What processes increase or decrease the proportion of people 
in a society who hold particular ideas about how to behave? We thus seek to 
understand the cultural analogues of the forces of natural selection, mutation, 
and drift that drive genetic evolution. These are divisible into three classes: 
random forces, decision-making forces, and natural selection operating directly 
on cultural variation. 

The random forces are the cultural analogues of mutation and drift in genetic 
transmission. Intuitively, it seems likely that random errors, individual idiosyn- 
cracies, and chance transmission play a role in behavior and social learning. For 
example, linguists have documented a good deal of individual variation in speech, 
some of which is probably random individual variation (Labov, 1972]. Similarly, 
small populations might well lose rare skills or knowledge by chance, for exam- 
ple, due to the premature death of the only individuals who acquired them 
(Diamond, 1978]. 

Decision-making forces result when naive individuals evaluate alternative 
behavioral variants and preferentially adopt some variants relative to others. 
Naive individuals may be exposed to a variety of models and preferentially imi- 
tate some rather than others. We call this force biased transmission. Alternatively, 
individuals may modify existing behaviors or invent new ones by individual 
learning. If the modified behavior is then transmitted, the resulting force is much 
like the guided, nonrandom variation of classical “Lamarckian” transmission. 

The decision-making forces are derived forces (Campbell, 1965], Decisions 
require rules for making them, and ultimately the rules must derive from the 
action of other forces. These decision-making rules may be acquired during an 
earlier episode of cultural transmission, or they may be genetically transmitted 
traits that control the neurological machinery for acquisition and retention of 
cultural traits. The latter possibility is the basis of the various sociobiological 
hypotheses about cultural evolution (Alexander, 1979; Lumsden and Wilson, 
1981]. The authors of these hypotheses, among others, argue that the course of 
cultural evolution is determined by natural selection operating indirectly on 
cultural variation via the decision-making forces. 

Natural selection may also operate directly on cultural variation. Selection is 
an extremely general evolutionary process (Campbell, 1965]. Darwin was able 
to formulate a clear statement of natural selection in the absence of a correct 
understanding of genetic inheritance because it is a force that will operate on any 
system of inheritance with a few key properties. There must be heritable vari- 
ation, the variants must affect phenotype, and the phenotypic differences must 
affect individuals’ chances of transmitting the variants they carry. That variants 
are transmitted by imitation rather than sexual or asexual reproduction does not 
affect the basic argument, nor does the possibility that some of the variants were 
originally acquired under the guidance of individual decisions. Darwin had no 
problem in imagining that random variation, acquired variation, and natural 
selection all acted together as forces in organic evolution. In the case of cultural 
evolution, we see none either. 

We have attempted to construct a series of models that represent all of 
the processes sketched in the previous section. One interesting general result is 
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that the processes of cultural evolution can easily lead to the evolution of be- 
haviors that reduce Darwinian fitness, especially when nonparental individuals 
are important in cultural transmission. In the simplest model we have analyzed 
(Richerson and Boyd, 1984) natural selection acting on cultural variation trans- 
mitted by a parent and a “teacher” may cause the trait favoring transmission via 
teachers to go to fixation at a cost in terms of the number of children produced 
by parents. Some Darwinian students of humans (Alexander, 1979; Lumsden 
and Wilson, 1981; Durham, 1976) argue that such effects are unlikely to be im- 
portant because a system of cultural inheritance with such properties would 
not be favored by selection on genes. Selection, the argument would run, ought 
to have acted to prevent such distorted cultural adaptations by either (1) the 
creation of decision-making forces that counteract the effect of selection on 
nonparentally transmitted cultural variation or (2) preventing nonparental in- 
dividuals from becoming important in cultural transmission. 

We believe this argument is incomplete because it ignores the fact that 
individual decision making may be costly compared to social learning. If the costs 
of using individual decision-making processes are high, selection may not favor 
decision-making forces that would completely compensate for the maladaptive 
effects of nonparental transmission. Similarly, if nonparental patterns of cultural 
transmission offer advantages to individuals of economy in information acqui- 
sition, selection on the genes that underlie a capacity for asymmetric transmis- 
sion may be favored. 

For example, nonparental individuals may be more useful models than 
parents because they may be more skilled or knowledgeable than parents. The 
effort in decision making required to discriminate exactly among the adaptive 
skills and maladaptive inclinations of teachers and other nonparental models may 
require extensive, costly, empirical checks of each element of the teacher's be- 
havior. In contrast, the use of relatively simple, low-cost decision-making rules 
to bias the choice of models or which of their behaviors to imitate may sub- 
stantially increase a naive person’s skills at a tolerable cost of imitating some 
maladaptive behaviors. We have analyzed the evolutionary consequences of a 
variety of simple bias rules. These models suggest that nonparental transmission 
may often be adaptive despite the cost of selection, especially in spatially variable 
environments (Boyd and Richerson, 1982, 1985: chs. 7 and 8). In essence, hu- 
mans may accept the cost of imitating maladaptive cultural traits because the 
alternatives are a high frequency of random errors or extreme decision-making 
costs. Even when a cultural system of inheritance optimizes genetic fitness when 
averaged over all the traits it transmits, many traits taken individually may be 
quite far from those that would optimize fitness. 

Even more extreme violations of the genetic fitness-optimizing model are 
conceivable. For example, if rules of mate choice are transmitted culturally, 
human genes might be “domesticated” to serve cultural functions. On the other 
hand, perhaps the critics of these models are correct, and the abstract possibilities 
demonstrated by such models are empirically unimportant. The essential point is 
that, like many bits of genetic realism, adding culture to the evolutionary process 
might make a qualitative difference in the behavior we expect to observe com- 
pared to that expected from the simple fitness optimizing caricature of evolution. 
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Why Families of Simple Models 
Disadvantages of Complex Models 

In the face of the complexity of evolutionary processes, the appropriate strategy 
may seem obvious: to be useful, models must be realistic; they should incor- 
porate all factors that scientists studying the phenomena know to be important. 
This reasoning is certainly plausible, and many scientists, particularly in eco- 
nomics (e.g., Hudson and Jorgenson, 1974} and ecology [Watt, 1968], have 
constructed such models, despite their complexity. On this view, simple models 
are primitive, things to be replaced as our sophistication about evolution grows. 

Nevertheless, theorists in such disciplines as evolutionary biology and eco- 
nomics stubbornly continue to use simple models even though improvements in 
empirical knowledge, analytical mathematics, and computing now enable them 
to create extremely elaborate models if they care to do so. Theorists of this 
persuasion eschew more detailed models because [I] they are hard to under- 
stand, [2) they are difficult to analyze, and [3] they are often no more useful for 
prediction than simple models. Let us now consider each of these points in turn. 

Complex, detailed models are usually extremely difficult to understand. As 
more realism is added, the myriad interactions within the model become almost 
as opaque as the real world we wish to understand. When a set of not-so- 
complex parts is linked into an interacting complex, it is often impossible to 
understand why the results behave as they do. To substitute an ill-understood 
model of the world for the ill-understood world is not progress. In the end, the 
only way to understand how such a model works is to abstract pieces from it or 
study simplified cases where its behavior is more transparent. Even when 
complex models are useful, they are so because we understand how they work in 
terms of simple models abstracted from them. 

Costly, complex models are most likely to be scientifically justified when 
phenomena are complex but not diverse. It is worth studying the complexities of 
atoms in great detail because there are only a few kinds, and they all obey the 
same basic laws. The generality of such laws makes them worth knowing even if 
the task is difficult. The equivalent sophistication in a model of the evolution of a 
given society or species is possible, perhaps, but unlikely to be justified on sci- 
entific grounds because of limited generalizability to other species or societies. 

The analysis of complex models is also expensive and time consuming. The 
complexity of a recursion model is roughly measured by the number of inde- 
pendent variables that must be kept track of from generation to generation. It 
usually is not possible to analyze nonlinear recursions involving more than a 
handful of variables without resorting to numerical techniques. Until the advent 
of digital computers, obtaining numerical solutions was impractical. Since then, 
however, there have been many attempts to make computer simulation models 
of complex social and biological processes. These projects have generally been 
quite costly. As the number of variables in a model increases, the number of 
interactions between variables increases even faster. This means that even with 
the fastest computers, it is not practical to explore the sensitivity of a model to 
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changes in assumptions about very many of its constituent interactions. Con- 
siderations of economy of effort in scientific practice dictate that we should be 
satisfied with much simpler models than we could build in principle. 

Complex, realistic models are sometimes employed when prediction rather 
than understanding is the main goal. Numerical weather prediction models and 
economic forecasting models come to mind. In both cases the gains in under- 
standing of atmospheric and economic phenomena are mostly attributable to the 
constituent simple submodels of particular processes that are individually not 
much good for prediction. The marginal increase in understanding relative to 
cost in the large predictive models is so small that only their practical application 
justifies their expense; scientific discovery would be better served by more 
attention to the simpler models. As Dupre [1987] observes, explanation differs 
from prediction in being easier to achieve (leaving aside statistical models that 
make no pretentions to explanation]. We would argue in addition that expla- 
nation or understanding is scientifically far more fundamental than prediction. 
This is most clearly evident in examples such as the simple deterministic models 
of economic and population processes that can exhibit chaotic behavior (Day, 
1982; May, 1976). If these models prove to apply in the real world, they will 
guarantee that only short-range predictions are possible with less than perfect 
specification of initial conditions, but they also give a quite satisfactory expla- 
nation of why this is so. The problem is well understood in the context of a 
purely physical problem, weather prediction (Smagorinsky, 1969). 

Detailed models of complex social or biological systems are often not much 
more useful for prediction than are simple models. Detailed models usually re- 
quire very large amounts of data to determine the various parameter values in the 
model. Such data are rarely available. Moreover, small inaccuracies or errors 
in the formulation of the model can produce quite erroneous predictions. The 
temptation is to “tune” the model, making small changes, perhaps well within 
the error of available data, so that the model produces reasonable answers. 
When this is done, any predictive power that the model might have is due more 
to statistical fitting than to the fact that it accurately represents actual causal 
processes. It is easy to make large sacrifices of understanding for small gains in 
predictive power. Contrarily, although evolutionary processes are inherently 
complex and diverse, models with a few variables may capture enough of the 
really important processes in a given case or class of cases both to explain and to 
predict with tolerable accuracy. 

The Utility of Simple Models 

In the face of these difficulties, the most useful strategy will usually be to build a 
variety of simple models that can be completely understood but that still capture 
the important properties of the processes of interest. Liebenstein (1976: ch. 2) 
calls such simple models “sample theories.” Students of complex and diverse 
subject matters develop a large body of models from which “samples” can be 
drawn for the purpose at hand. Useful sample theories result from attempts to 
satisfy two competing desiderata: they should be simple enough to be clearly and 
completely grasped, and at the same time they should reflect how real processes 
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actually do work, at least to some approximation. A systematically constructed 
population of sample theories and combinations of them constitutes the theory 
of how the whole complex process works. 

The synthetic theory of evolution provides a good example. Each of the 
basic processes (e.g., selection, mutation, drift) is represented by a large variety 
of simple models, some specific to a particular population, and others quite 
general. These models are combined in different ways to represent interesting 
phenomena (e.g., sexual selection, speciation). This whole family of models, 
together with a knowledge of which models are appropriate for what kinds of 
situations, constitutes the theoretical system of population biology. 

A theoretical system so constituted from simple sample models is a com- 
plicated and diverse collection of knowledge; it cannot be legitimately labeled 
simpleminded. Still, every tactical deployment of models to study a question of 
interest will be quite simple compared to the phenomena that they are intended 
to represent. The sample models are caricatures. If they are well designed, they 
are like good caricatures, capturing a few essential features of the problem in a 
recognizable but stylized manner and with no attempt to represent features not 
of immediate interest. 

Wimsatt (1980, 1981) provides good general discussions of tactical con- 
siderations in the deployment of simple models. To Wimsatt, all sample models 
of evolutionary phenomena should be viewed as “heuristics” rather than uni- 
versally applicable laws. This terminology has the virtue of emphasizing that all 
sample models have defects. They usefully apply only over a limited range of 
phenomena, and even over the range where they are useful they are almost 
certain to have biases. Even the very best scientific heuristic (or sample model) 
will fail and possibly mislead if pushed too far or in the wrong direction. It is in 
attention to details of the use of simple sample theories that these problems are 
minimized and the maximum understanding gained. The user attempts to dis- 
cover “robust” results, conclusions that are at least qualitatively correct, at least 
for some range of situations, despite the complexity and diversity of the phe- 
nomena they attempt to describe. 

Note that simple models can often be tested for their scientific content via 
their predictions even when the situation is too complicated to make practical 
predictions. Experimental or statistical controls often make it possible to expose 
the variation due to the processes modeled, against the background of “noise” 
due to other ones, thus allowing a ceteris paribus prediction for purposes of 
empirical testing. Simple models, in other words, are the formal theoretical 
parallel of the experimental and comparative methods so widely used in biology 
and the social sciences. 

Generalized Sample Theories 

Generalized sample theories are an important subset of the simple sample 
theories used to understand complex, diverse problems. They are designed to 
capture the qualitative properties of the whole class of processes that they are 
used to represent, while more specialized ones are used for closer approxima- 
tions to narrower classes of cases. Generalized sample theories are useful because 
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we do not seem to be able to construct models of social and biological phe- 
nomena that are general, realistic, and precisely predictive [Levins, 1966, 1968]. 
That is, evolutionary biologists and social scientists have not been able to satisfy 
the epistemological norm derived from the physical sciences that holds that the- 
ory be in the form of universal laws that can be tested by the detailed predictions 
they make about the phenomena considered by the law. This failure is probably 
a consequence of the complexity and diversity of living things. Basic theoretical 
constructs like natural selection are not universal laws like gravitation; rather, 
they are taxonomic entities, general classes of similar processes that nonetheless 
have a good deal of diversity within the class. A theoretical construct designed to 
represent the general properties of the class of processes labeled natural selection 
must sacrifice many of the details of particular examples of selection. On the 
other hand, a model tailored to the details of a particular case is unlikely to 
have much relevance beyond that case. Further, the most precise predictions 
may be obtained by statistical models that sacrifice realism and hence are useless 
as explanatory devices. 

One might agree with the case for a diverse toolkit of simple models but still 
doubt the utility of generalized sample theories. Fitness-maximizing calculations 
are often used as a simple caricature of how selection ought to work most of 
the time in most organisms to produce adaptations. Does such a generalized 
sample theory have any serious scientific purpose? Some might argue that their 
qualitative kind of understanding is, at best, useful for giving nonspecialists a 
simplified overview of complicated topics and that real scientific progress still 
occurs entirely in the construction of specialized sample theories that actually 
predict. A sterner critic might characterize the attempt to construct generalized 
models as loose speculation that actually inhibits the real work of discovering 
predictable relationships in particular systems. 

These kinds of objections implicitly assume that it is possible to do science 
without any kind of general model. All scientists have mental models of the world. 
The part of the model that deals with their disciplinary specialty is more detailed 
than the parts that represent related areas of science. Many aspects of a scientist’s 
mental model are likely to be vague and never expressed. The real choice is be- 
tween an intuitive, perhaps covert, general theory and an explicit, often mathe- 
matical, one. 

It seems to us that generalized sample models such as fitness-optimizing 
models do play an important role. Well chosen to represent the stripped-down 
essence of a much larger set of more specialized models, generalized sam- 
ple theories serve important functions in scientists’ cognitive organization of 
complex-diverse subject matters and in communication between specialists. For 
example, we are concerned with the details of how cultural transmission occurs, 
a subject studied by psychologists [Boyd and Richerson, 1985: ch. 3). Social 
learning theorists have made many, but not all, of the kinds of measurements 
that are necessary for specifying good sample theories of cultural transmission. 
Crucial unknowns include the mechanisms by which variation and covariation 
are maintained in cultural traits. These properties have important implications 
for the process of cultural evolution because the selection and bias forces depend 
on the maintenance of variation for their effectiveness. These deficiencies of 
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social learning theory are not at all apparent in the absence of a theory linking the 
psychology of enculturation with the macroscopic phenomena of social in- 
stitutions and long-run outcomes. It seems unlikely that a sensible psychologist 
would be motivated to make the arduous and costly experiments necessary to 
determine such processes without a general theoretical argument justifying their 
importance. This is an example of a common situation: constructing models that 
make such links, even if they are simple caricatures, often shows that processes 
with small, relatively hard to measure, effects can produce major results. 

The relationship between a generalized sample theory and empirical test or 
prediction is a subtle one. To insist upon empirical science in the style of physics is 
to insist upon the impossible. However, to give up on empirical tests and pre- 
diction would be to abandon science and retreat to speculative philosophy. 
Generalized sample theories normally make only limited qualitative predictions. 
The logistic model of population growth is a good elementary example. At best, 
it is an accurate model only of microbial growth in the laboratory. However, it 
captures something of the biology of population growth in more complex cases. 
Moreover, its simplicity makes it a handy general model to incorporate into 
models that must also represent other processes such as selection, and intra- and 
interspecific competition. If some sample theory is consistently at variance with 
the data, then it must be modified. The accumulation of these kinds of mod- 
ifications can eventually alter general theory, either by compelling the aban- 
donment of some sample models or by systematizing knowledge about the 
variation of processes. In extreme cases, major discoveries in some of the com- 
ponents of a general theory can compel the reorganization of the entire edifice, as 
exemplified by the impact of Mendelian genetics on Darwinian theory in biology. 
No one nowadays would think of using Karl Pearson’s models of the inheritance 
of acquired variation as a sample theory of genetic inheritance, although they 
might have some specialized uses in the study of cultural evolution. 

A generalized model is useful so long as its predictions are qualitatively 
correct, roughly conforming to the majority of cases. It is helpful if the inevitable 
limits of the model are understood. It is not necessarily an embarrassment if more 
than one alternative formulation of a general theory, built from different sam- 
ple models, is more or less equally correct. In this case, the comparison of theories 
that are empirically equivalent makes clearer what is at stake in scientific con- 
troversies and may suggest empirical and theoretical steps toward a resolution. 


Some Remarks on the Strategy of Building Simple Models 

One of the main points of the preceding discussion is that the analysis of evo- 
lutionary problems using simple models depends very much on the appropriate 
choice of those models. How does one go about making such choices? Evolu- 
tionary biologists and social scientists use a variety of methods to accomplish this 
task that, we believe, can be collected under three main headings, correspond- 
ing to idealized analytical steps: (1] the choice of problem, (2) the modular- 
ization of analysis, and [3] the construction of synthetic hypotheses that we shall 
call “plausibility arguments.” 
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Choice of Problem 

When one uses simple models to understand complex and diverse problems, the 
choice of the problem to be analyzed exerts a strong influence on the kinds of 
simplifications one chooses. The idea is to simplify most drastically those aspects 
that are not centrally related to the problem at hand in order to retain the 
maximum feasible detail in the features of most direct interest. In the case of our 
models of cultural evolution, we have been concerned with the evolution of cul- 
tural organisms from acultural ancestors. This required us to represent the pro- 
cesses of ordinary organic evolution in most of our modeling efforts. Still, we were 
also interested in trying to develop preliminary general models of the important 
structural features and forces that affect cultural evolution. Given this choice of 
problem, it seemed advisable to use very simple models of genetic processes to 
represent the evolution of genetic capacities for culture in order that the models 
of cultural transmission could be made a bit more elaborate. Thus, we frequently 
asked what parameter value of a model controlling the propensity to acquire 
culture in a certain way would cause fitness to be optimized. Those models that 
included specific genetics used only the simplest haploid, one locus, or quanti- 
tative models of genetic transmission. 

Models emphasizing cultural detail at the expense of genetic detail accept 
the risk that some particular complexity of the human genetic system plays a 
direct role in the coevolution of genes and culture. For example, if genes af- 
fecting the behavior toward relatives are transmitted on the Y chromosome, 
as Hartung (1976) suggested, the models we constructed might turn out to be 
seriously misleading. The opposite risk, however, seemed more serious to us in 
the context of the problem; in models that are too complex, the important de- 
tails of culture itself might be obscured or lost. Several commentators (Maynard 
Smith and Warren, 1982; Boyd and Richerson, 1983b; Kitcher, 1985) have re- 
marked that the analysis that led Lumsden and Wilson (1981) to their “thousand 
year rule” is dubious because key properties of culture disappear as a result of 
simplifying assumptions. The general formulation of their model is conceptually 
satisfactory, but its complexity appears to have dictated misleading simplifica- 
tions in the interests of successful analysis. 

Modularization of Analysis 

Most interesting evolutionary problems involve the interaction of evolutionary 
processes and a particular pattern of genetic transmission and gene expression. 
For example, the interaction of selection and mutation at a diploid locus is a 
classic problem of the synthetic theory. The sample models of the parts of this 
problem are less interesting than the combination of them in a model that can 
help us understand how the two basic forces interact with genetically inherited 
variation. Similar problems are of interest in cultural evolution. How does 
learning, acting as an evolutionary force because learned variants can be imitated, 
interact with selection, both selection on the cultural variants and on the un- 
derlying senses of reward and punishment that guide learning? Such combina- 
tions of processes inevitably make for relatively complex models. To make any 
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headway, relatively difficult mathematical and experimental procedures have to 
be introduced, and many simplifying assumptions have to be made. Difficult 
choices between analytical tractability, comprehensibility, generality, and real- 
ism have to be made. Is a fitness optimization representation of the genetic 
process a reasonable simplification, or can some additional genetic realism be 
usefully retained in the context of the problem? 

The answers to such questions are sought by breaking the problem down 
first into its constituent sample models and then reassembling them step by step 
into more complex combinations. This tactic is obvious but easily misunderstood 
and misused. In the long run, the simple models strategy leads to large families 
of well-understood sample models, some of which will be relatively complex, 
specialized, and difficult to understand. Also, relatively complex combinations of 
models are often useful. However, such relatively complicated models depend 
on a thorough understanding of the simplest models of each family and of the 
constituent submodels of compound models. The possibility for artifactual re- 
sults increases with the complexity of the analysis unless one can be reasonably 
confident that the constituent sample models are empirically reasonable and 
mathematically well behaved. It is relatively much easier to conduct experiments 
and detailed mathematical analysis on processes when they are isolated than 
when they are imbedded in a complex system. In population biology, both 
history and pedagogic practice suggest that one must begin with an under- 
standing of the elementary constituents of the theory. 

While building models of complex processes composed of simpler modules 
may be second nature to evolutionary biologists, in our experience it sometimes 
confuses social scientists who read the present body of theory in cultural evolu- 
tion. The modularization of complex problems seems reductionistic; even after 
the parts are reassembled it seems to some readers as if the models are attempt- 
ing to deduce the properties of wholes from properties of parts. The tactical 
' 'reductionism’ ’ used to understand a problem does not imply that the interaction 
of parts might not produce irreducible effects. For example, some models of 
culture built using this tactic suggest that group selection might be especially 
likely under some plausible forms of cultural transmission (Boyd and Richerson, 
1985: ch. 7). 

Sometimes, evolutionary biologists (and social scientists who use similar 
methods, such as economists) contribute to the confusion by failing to distin- 
guish between the heuristic use of tactical reductionism from a real belief that 
some particular simple model is a true description of a complex process. Indeed, 
the relative ease with which interesting, even approximately correct, results can 
be obtained for intrinsically rather complex processes with simple models can 
lead the unwary to conclude that successful tactical reduction implies the ade- 
quacy of a philosophical reductionist stance. Those who are so tempted should 
consult Wimsatt’s work. Most users of simple models know better. For example, 
Dawkins (1982), a prototypical genetic reductionist by some accounts (Sober, 
1984), begins his discussion (pp. 1-2) by asking the reader to take his idea of 
selfish genes with extended phenotypes as a heuristic model. Later (by p. 7), 
Dawkins does express the hope that it may prove more fundamental than a mere 
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heuristic, but the distinction between the two interpretations is clear, and the 
reader is left the choice. 

The development of a formal theory of cultural evolution is in its infancy, 
and attention has properly concentrated on quite elementary models. This means 
that the theory to date appears quite reductionistic. For example, most models 
consider only one cultural trait. On the one hand, an overenthusiast might claim 
that these models are relatively successful in explaining human behavior and 
hence that human cultures really can be atomized into traits. On the other hand, 
a critic might complain that they are completely bankrupt because they do not 
take account of the fact that cultural traits must interact in complex ways. The 
fact is that such preliminary models are silent about what complexities might 
flow from the interaction of multiple traits. That is a difficult question in its own 
right, but one whose analysis must be deferred until we understand the simpler 
theoretical elements we might use in such an analysis. 

The thorough study of simple models includes pressing them to their ex- 
treme limits. This is especially useful at the second step of development, where 
simple models of basic processes are combined into a candidate generalized 
model of an interesting question. There are two related purposes in this exercise. 

First, it is helpful to have all the implications of a given simple model 
exposed for comparative purposes, if nothing else. A well-understood simple 
sample theory serves as a useful point of comparison for the results of more 
complex alternatives, even when some conclusions are utterly ridiculous. 

Second, models do not usually just fail; they fail for particular reasons that 
are often very informative. Just what kinds of modifications are required to make 
the initially ridiculous results more nearly reasonable? For example, the failures 
of the logistic model of population growth suggest the amendments needed to 
make better models. In the case of culture, models that include only faithful 
cultural transmission suggest that culture is generally inferior to genes as a mode 
of inheritance (Cavalli-Sforza and Feldman, 1983}. If the evolution of culture in 
the hominid line was favored by natural selection, there must be more to the 
story than just the acquisition of behavior by imitation. We have suggested that 
the ability of culture to couple individual learning to a transmission mechanism, 
thus to generate a system for the inheritance of acquired variation, could cause 
capacities for culture to evolve [Boyd and Richerson, 1983a, 1985: ch. 4). How- 
ever, this analysis also fails because it suggests that the advantages of culture are 
quite general, and hence that many organisms ought to have “Lamarckian” 
systems of inheritance. This failure in turn suggests that there are other costs to 
the inheritance of acquired variation that must be accounted for. 

In both of these respects, human sociobiology has made a major contribu- 
tion by showing what must be true if the genetic fitness optimizing model 
generally holds when behavioral variation is proximally transmitted by culture. 
For example, Alexander [1979; see also Flinn and Alexander, 1982} argues that 
decision-making forces are powerful enough to constrain cultural variation to 
maximize fitness in most circumstances. Important qualitative predictions flow 
from this argument. If strong, accurate decision making is possible, then humans 
need not depend on relatively passive imitation; they can easily invent or choose 
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those behaviors appropriate to the environments they find themselves in. If so, 
culture will behave more like ordinary mechanisms of phenotypic flexibility than 
like an inheritance system. Empirically, behavioral variation will be largely ex- 
plicable, even in the short run, in terms of environmental variation rather than 
the variation in what traits are available for imitation. This argument also implies 
that costs of making decisions are low relative to any economies that might result 
from imitation. In our judgment (Boyd and Richerson, 1985: ch. 5), theory and 
the available data suggest that Alexander’s argument is incorrect in general, 
although it may well be roughly correct for those traits for which accurate 
decision making is easy. Regardless of whether we or Alexander ultimately prove 
more nearly correct, his contribution is substantial; work on the complexities of 
culture is much aided by having the implications of the simplest genetic fitness- 
maximizing model incorporating culture cogently developed. 

The exhaustive analysis of many sample models in various combinations is 
also the main means of seeking robust results (Wimsatt, 1981). One way to gain 
confidence in simple models is to build several models embodying different 
characterizations of the problem of interest and different simplifying assump- 
tions. If the results of a model are robust, the same qualitative results ought to 
obtain for a whole family of related models in which the supposedly extraneous 
details differ. The fact that genetic and game theoretic models of altruism usu- 
ally lead to similar conclusions reassures us that general results like Hamilton’s 
k= Mr rule are robust. Similarly, as more complex considerations are introduced 
into the family of models, simple model results can be considered robust only 
if it seems that the qualitative conclusion holds for some reasonable range of 
plausible conditions. Thus, quantitative genetic (Boyd and Richerson, 1982) and 
multiple-locus models (Uyenoyama and Feldman, 1980) suggest that Hamilton’s 
rule is approximately correct when a variety of complications is introduced. 
Complications substantially affect the exact form of the rule, but do preserve the 
qualitative result that kin cooperation can evolve and the propensity to coop- 
erate should be a function of relatedness under most circumstances that seem 
empirically reasonable. Nevertheless, it is slow and difficult work to make rea- 
sonably certain that particular results can be treated as robust (Wimsatt, 1980). 

In the case of cultural evolution, we make the tentative claim that the costly 
information argument is a robust result. In all of the models we have constructed 
of the novel structural properties of culture and the evolutionary forces that 
result from them, it seems that optimizing the genetic fitness of a capacity for 
culture generally leads to a situation in which many individual cultural traits can 
easily evolve to values quite distant from those that would maximize fitness, so 
long as decision making is costly. These results do not depend on whether cul- 
tural traits are imagined to be discrete characters or continuous quantitative 
variables, for example. The tentativeness of the claim must be emphasized be- 
cause the whole corpus of models of cultural evolution is still so small. 


Plausibility Arguments 

We believe that “plausibility argument” is a useful term for a scientific construct 
that plays much the same role in the study of complex, diverse phenomena that 
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mutually exclusive hypotheses are supposed to play in the investigation of 
simpler subject matters. A plausibility argument is a hypothetical explanation 
having three features in common with a traditional hypothesis: (1] a claim of 
deductive soundness, of in-principle logical sufficiency to explain a body of data; 
(2) sufficient support from the existing body of empirical data to suggest that 
it might actually be able to explain a body of data as well as or better than com- 
peting plausibility arguments; and [3] a program of research that might distin- 
guish between the claims of competing plausibility arguments. The differences 
are that competing plausibility arguments [1] are seldom mutually exclusive, [2] 
can seldom be rejected by a single sharp experimental test (or small set of them), 
and (3) often end up being revised, limited in their generality or domain of 
applicability, or combined with competing arguments rather than being rejected. 
In other words, competing plausibility arguments are based on the claims that 
a different set of submodels is needed to achieve a given degree of realism and 
generality, that different parameter values of common submodels are required, 
or that a given model is correct as far as it goes, but applies with less generality, 
realism, or predictive power than its proponents claim. Most frequently, the 
empirical program suggested by competing plausibility arguments is an arduous 
series of measurements of the relative strengths of several known processes in a 
wide range of organisms. 

The reason for these differences is that quantitative questions are at the crux 
of debates about evolutionary processes. For example: how strong is selection 
among individuals relative to selection among groups? Theoretical analysis 
suggests that selection among groups must be commonplace, and laboratory 
experiments (Wade, 1977) demonstrate that it could have important effects. 
However, it is not at all clear whether selection among groups is important 
in nature. Sex ratio provides another example. Clear examples of sex ratio dis- 
tortion exist (Hamilton, 1967), and theory suggests that it should be favored 
under a wide variety of ecological conditions (Charnov, 1982). Yet this process 
seems to be relatively rare — at least weak enough to neglect in most cases. Even 
if we are willing to be content with qualitative knowledge of complex processes, 
the term “qualitative” must be taken in the sense of rough estimates of quan- 
titative variables, not in the sense of simple acceptance or rejection of mutually 
exclusive hypotheses. This feature of evolutionary problems is the basis for 
Quinn and Dunham’s (1983) rejection of Popperian falsification as a proper 
epistemological model in ecology and evolution (see also Rapoport’s, 1967, 
claim that many scientific paradoxes have been resolved when the polar positions 
were shown to be only opposite ends of a continuum). 

Human sociobiology provides a good example of a plausibility argument. 
The basic premise of human sociobiology is that fitness-optimizing models 
drawn from evolutionary biology can be used to understand human behavior. 
Many social scientists have objected to this enterprise on the grounds that 
evolutionary theory does not account for the existence of culture. As we have 
already noted, Alexander (1979), Lumsden and Wilson (1981), Durham (1976), 
and others have defended the fitness-optimizing approach not by denying the 
importance of culture but by proposing various means by which decision-making 
forces could evolve under the guidance of selection to constrain cultural evolution 
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so as generally to produce fitness-optimizing behavior. These authors have 
supported their plausibility argument by constructing an array of simple models 
that predict the details of human behavior in various circumstances — for ex- 
ample, patterns of adoption, unilineal descent, and child abuse — and compared 
the results of these simple models with empirical data. 

The sociobiological explanations of human behavior and those derived from 
explicit models of cultural evolution provide an example of competing plausi- 
bility arguments. As Flinn and Alexander (1982] argue, there is wide agreement 
among Darwinian students of the problem of human evolution that culture is 
important and that the processes of cultural evolution may sometimes fail to 
keep cultural variation “on track” of genetic fitness (e.g., Alexander, 1979:142]. 
Disagreements revolve around the relative strength of decision-making forces 
compared to natural selection on cultural variation, the degree to which cultural 
transmission acts like an inheritance system rather than an ordinary mechanism 
for phenotypic flexibility, the importance of nonparental transmission, and so 
forth. For example, we have argued that decision making is frequently costly and 
that this allows culture a certain autonomy, while Durham (1976] argues that 
cultural evolution will be constrained to produce behaviors that approximately 
maximize fitness most of the time. 

We think that the clearest way to address the controversial questions raised 
by competing plausibility arguments is to try to formulate models with para- 
meters such that for some values of the critical parameters the results approxi- 
mate one of the polar positions in such debates, while for others the model 
approximates the other position. If the parameters that produce these contrasting 
results capture some real features of the processes of cultural and genetic coevo- 
lution, it may be possible to understand at least what is at stake in the controversy. 
In the models we have constructed, several parameters control the extent to 
which a typical cultural trait will be at the fitness optimum. If decisions about 
what cultural behaviors to adopt or invent can be made easily and accurately, and 
the rules that guide choices are ultimately transmitted genetically and subject to 
selection, culture will be very strongly constrained to maximize genetic fitness. 
Similarly, if important cultural traits are transmitted mostly from biological 
parents to offspring, cultural variation will act much like an extra chromosome of 
a biochemically odd kind. Even if decision-making forces are weak, selection on 
cultural variation will favor individual (inclusive] reproductive success, subject 
only to the same kinds of qualifications that obtain for a genetic locus. This result 
seems to approximate Durham’s (1976] argument. As decision-making costs and 
nonparental transmission are allowed to become more important, cultural evo- 
lution becomes less directly constrained by selection on genes that control culture 
and it is possible to approximate positions like the group-functionalism of many 
social scientists and the afunctional position of Sahlins (1976]. 

As primitive as our own models are in this regard (see also Pulliam and 
Dunford, 1980; Werren and Pulliam, 1981; Pulliam, 1982, 1983], we think they 
are a promising step. The costs of decision making and the extent to which 
important items of culture are transmitted by nonparental individuals are em- 
pirical issues that can be resolved. Indeed, data already exist on these points 
(Boyd and Richerson, 1985: chs. 3 and 5], It would be overenthusiastic to claim 
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that any of the controversial questions surrounding the application of Darwinism 
to human culture are resolved, but we do believe that the modest body of for- 
mal theory so far developed, and empirical argument derived from the theory, 
has clarified the issues to the extent that rapid progress is now possible. 

A well-developed plausibility argument differs sharply from another com- 
mon type of argument that we call a programmatic claim. Most generally, a 
programmatic claim advocates a plan of research for addressing some out- 
standing problem without, however, attempting to construct a full plausibility 
argument. Programmatic claims can be exceedingly useful; the development of 
a Darwinian theory of culture was greatly stimulated by mostly programmatic 
essays such as those by Campbell [1965), Ruyle [1973), and Cloak [1975). 
However, they are useful only insofar as they indicate the possibility of, or need 
for, new plausibility arguments. An attack on an existing, often widely accepted, 
plausibility argument on the grounds that the plausibility argument is incom- 
plete is a kind of programmatic claim. Critiques of human sociobiology are com- 
monly of this type. Burden-of-proof claims are another variant. For example, 
sociobiologists often seem to imply that the general success of adaptive reasoning 
in biology means that the existence of any prima facie plausible adaptive in- 
terpretation of human behavior is a sufficient counter to anything but a perfect 
case for a nonadaptive explanation. 

Programmatic attacks and burden-of-proof claims can be positively harmful 
when taken, by themselves, as sufficient substitutes for a sound plausibility ar- 
gument. We have argued that theory about complex-diverse phenomena is 
necessarily made up of simple models that omit many details of the phenomena 
under study. It is very easy to criticize theory of this kind on the grounds that it is 
incomplete [or defend it on the grounds that it one day will be much more 
complete). Such criticism and defense is not really very useful because all such 
models are incomplete in many ways and may be flawed because of it. What is 
required is a plausibility argument that shows that some factor that is omitted 
could be sufficiently important to require inclusion in the theory of the phe- 
nomenon under consideration, or a plausible case that it really can be neglected 
for most purposes. Thus, for example, it is not enough to attack a purportedly 
general plausibility argument with a few special cases, for it is [or ought to be) 
stipulated that generalized models are always likely to account more or less poorly 
for many special cases. In contrast, the success of genetic fitness-maximizing 
theory in biology cannot be used to defend that generalized model in the face of 
plausible arguments that cultural evolution is a divergent special case. 

It seems to us that until very recently, "nature-nurture” debates have been 
badly confused because plausibility arguments have often been taken to have 
been successfully countered by programmatic claims. It has proved relatively easy 
to construct reasonable and increasingly sophisticated Darwinian plausibility 
arguments about human behavior from the prevailing general theory. It is also 
relatively easy to spot the programmatic flaws in such arguments; conventional 
Darwinian models do not allow for human culture. The problem is that pro- 
grammatic objections have not been taken to imply a promise to deliver a full 
plausibility claim. Rather, they have been taken as a kind of declaration of in- 
dependence of the social sciences from biology. Having shown that the biological 
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theory is in principle incomplete, the conclusion is drawn that it can safely be 
ignored. Sahlins’s [1976) objections to human sociobiology seem to us to have 
been as much in this tradition as Tarde’s (1903:xxi-xxii) very early one. Both 
arguments ignore that Darwinian plausibility arguments ordinarily contain a se- 
rious rationale for accepting their claims despite the unique aspects of the human 
species. Certainly this is the case with contemporary human sociobiology and 
explains why it has attracted support by social scientists like van den Berghe 
(1979, 1981), who cannot be accused of simpleminded hereditarianism. 


The Importance of Scientific Pluralism 

Jared Diamond (personal communication) has drawn the following useful lesson 
from his experience as both a physiologist and a community ecologist: in phys- 
iology, controversial issues are ordinarily settled quickly by definitive experi- 
ments. As a result, debate over contending hypotheses is quite restrained and 
polite. One or the other contending claim is almost certain to turn out wrong in 
short order, and any grandiose pronouncements, ad hominem attacks, or similar 
departures from polite scientific discourse can be held against the loser. As long 
as scientists know that they can easily be proven wrong by a few critical exper- 
iments in the next few years, they will refrain from such departures. In ecology, 
major controversies last much longer because the issues are more complex and 
testing contending plausibility arguments is a long-drawn-out affair. The result is 
that individual claimants are often unlikely to be proven cleanly right or wrong, 
at least during their own lifetimes. Rhetorical excesses thus cannot be clearly 
proven as such by the failure of the programmatic claim or plausibility argument 
to which they are attached, and consequently the motivation to avoid them is 
reduced. 

Perhaps differences between these two disciplines can be understood in 
terms of Campbell's (1979) general discussion of scientific honesty. According to 
Campbell, scientists are more honest in their occupational behavior than other 
professionals, but not because they are morally superior as individuals. Rather, 
they are careful to present honest work because other scientists are very discrim- 
inating consumers. Scientists frequently replicate crucial experiments and can 
gain prestige by detecting errors. In a controversy, many members of the com- 
munity will act as relatively unbiased judges of the acceptability of contending 
hypotheses because their own work depends on using the correct result — say, to 
make a more accurate measurement instrument. Such acceptors have an interest 
in the resolution of the controversy but not a vested interest in any particular 
outcome. It seems likely that this mechanism will work much more effectively 
when controversial issues are resolved quickly, and consumer/acceptors can 
confidently use secure results in their own work. In the case of evolutionary and 
ecological problems, ambiguity lasts longer, and consumers may be forced to 
choose among plausibility arguments, thus coming to have a vested interest in 
the controversy. The extensive empirical program of the complex-diverse dis- 
ciplines reduces the incentive to replicate individual experiments directly be- 
cause they make so small a contribution to the total program. 



SIMPLE MODELS OF COMPLEX PHENOMENA 415 

Campbell (1969, 1986) contributed an insightful analysis of another po- 
tentially serious problem in the study of complex-diverse subject matters: the 
social complexity of the sciences that study them. Specialization is obviously 
demanded by complexity and diversity. But there is no guarantee that disciplines 
will not evolve what Campbell characterized as parochial “tribal” norms and 
customs that impede scientific progress. His argument is illustrated with refer- 
ence to the arbitrary disciplinary boundaries, schools within disciplines, and the 
resulting “ethnocentrism” within the social sciences. Our impression is that 
the scientific endeavor becomes more prone to “ethnocentrism” as problems be- 
come more complex and diverse; certainly evolutionary biology, despite the 
unifying value of Darwinism, is not immune. As the enforcement of the uni- 
versalistic norms of scientific discourse weaken, very human motives, such as a 
desire for collegial relations within one’s discipline, a tendency to find that one's 
extrascientific ideology can be squared one way or another with one's science, 
career considerations, and a need to economize on information, can easily lead 
the social structure of science in directions that reduce its collective ability to 
solve complex-diverse problems. The mental effort of keeping multiple, partly 
conflicting, plausibility arguments in mind, the ambiguous relationship of these 
to ideas and norms derived from other roles, and the need to have some knowl- 
edge of several unfamiliar disciplines might be psychological motivations that 
encourage the formation of independent disciplines and schools with little com- 
munication between them. Nevertheless, it seems inescapable that complex- 
diverse subjects demand free communication between specialists and a wide 
tolerance for the pursuit of temporarily divergent plausibility claims. 

Deriving norms from this diagnosis is by no means straightforward. Perhaps 
new disciplines and new ideas need a measure of isolation, which the develop- 
ment of ethnocentric and sectarian attitudes affords (Campbell, 1985; Beatty, 
1987). On the other hand, unchecked, this process can result in a declaration of 
independence for a mature discipline, such as Sahlins offers for anthropology, 
which may be wholly harmful. There may be an optimal amount of disciplinary 
and research program “ethnocentrism” for maximizing scientific progress at any 

Nonetheless, we think that the following two norms would, if adopted, 
improve scientific debate surrounding complex, diverse subjects. 

Ad hominem attacks on particular positions and the use of self-serving 
programmatic claims should be viewed as tacky. Given the deep importance of 
human behavior to humans, the weakness of the consumer/acceptor mechanism 
for regulating academic discourse, and the fact of the evolution of “ethnocen- 
tric” norms within disciplines, it is utopian to expect that the temptation to 
behave in such ways will always be resisted, particularly by those who are le- 
gitimately pursuing a position. Widespread agreement that such behavior is 
moderately offensive is a practical norm perhaps and might help to further 
productive debate over real issues. 

Scientists should be encouraged to take a sophisticated attitude toward 
empirical testing of plausibility arguments (Quinn and Dunham, 1983; Dia- 
mond, 1986). Folk Popperism among scientists has had the very desirable re- 
sult of reducing the amount of theory-free descriptive empiricism in many 
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complex-diverse disciplines, but it has had the undesirable effect of encouraging 
a search for simple mutually exclusive hypotheses that can be accepted or re- 
jected by single experiments. By our argument, very few important problems in 
evolutionary biology or the social sciences can be resolved in this way. Rather, 
individual empirical investigations should be viewed as weighing marginally for 
or against plausibility arguments. Often, empirical studies may themselves dis- 
cover or suggest new plausibility arguments or reconcile old ones. 


Conclusion 

We confess to being somewhat puzzled by the debate between the “adapta- 
tionists” and their critics. We suspect that most evolutionary biologists and 
philosophers of biology on both sides of the dispute would pretty much agree 
with the defense of the simple models strategy presented here. To reject the 
strategy of building evolutionary theory from collections of simple models is to 
embrace a kind of scientific nihilism in which there is no hope of achieving an 
understanding of how evolution works. On the other hand, there is reason to 
treat any given model skeptically. As Kitcher (1987) notes, his criticisms of 
optimality arguments are not meant as “forlorn skepticism,” but rather as helpful 
“in pinpointing strategies for improving hypotheses about selective pressures 
and functional significance” (p. 99). Kitcher quite properly and quite explicitly 
calls attention to the fact that because diversity and complexity are real, the 
tactics of seeking understanding via simple models is something that must be 
done with care. No one ought to disagree. 

Unfortunately, the critics of “adaptationism” are not always as sophisticated 
as this; they sometimes seem to want to benefit rhetorically from a programmatic 
critique that implies scientific nihilism without having to face the real (and ex- 
tremely unpleasant) consequences of actually adopting it. It may be possible to 
defend the proposition that the complexity and diversity of evolutionary phe- 
nomena make any scientific understanding of evolutionary processes impossible. 
Or, even if we can obtain a satisfactory understanding of particular cases of 
evolution, any attempt at a general, unified theory may be impossible. Some 
critics of adaptationism seem to invoke these arguments against adaptationism 
without fully embracing them. The problem is that alternatives to adaptationism 
must face the same problem of diversity and complexity that Darwinians use the 
simple model strategy to finesse. The critics, when they come to construct 
plausibility arguments, will also have to use relatively simple models that are 
vulnerable to the same attack. If there is a vulgar sociobiology, there is also a vulgar 
criticism of sociobiology. Perhaps because we have devoted a considerable effort 
to building a plausibility argument for the novel and sometimes maladaptive role 
of culture in human evolution, we are very sensitive to the strength of the so- 
ciobiologists’ plausibility arguments and the weakness of most of the objections to 
them. 

In our opinion, human sociobiology has been a successful research program 
because it has made rather good use of the simple models strategy. Its practi- 
tioners have taken care to construct sound plausibility arguments and, in the 



SIMPLE MODELS OF C< 


4'7 


spirit of scientific pluralism, to use the work of social scientists. As pursuers of a 
somewhat narrow range of plausibility arguments, their work is not above crit- 
icism in detail or in general. As befits pursuers, they have usefully driven the 
fitness-optimizing postulate to extremes that are not likely to be ultimately 
warranted. Less usefully, they have used a burden-of-proof claim to attempt to 
insulate sociobiology from counterarguments. On the other hand, the attacks on 
sociobiology are a good source of negative object lessons. The criticism of human 
sociobiology has far too frequently depended on mere programmatic claims 
(often invalid ones at that, as when sociobiologists are said to ignore the im- 
portance of culture and to depend on genetic variation to explain human dif- 
ferences). These claims are generally accompanied by dubious burden-of-proof 
arguments. Some critics also show little sense of the importance of scientific 
pluralism. 


NOTE 

We thank D. T. Campbell, J. M. Diamond, J. M. Emlen, G. Macey, A. Rosenberg, 
E. A. Smith, J. Staddon, & S. Vail for comments on drafts of this chapter. We also 
benefited from conversations with J. Quinn and J. Griesemer. 
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20 Memes 

Universal Acid or a 
Better Mousetrap? 


Among the many vivid metaphors in Darwin’s Dangerous Idea, one 
stands out. The understanding of how cumulative natural selection gives rise to 
adaptations is, Daniel Dennett says, like a “universal acid” — an idea so powerful 
and corrosive of conventional wisdom that it dissolves all attempts to contain it 
within biology. Like most good ideas, this one is very simple: once replicators 
(material objects that are faithfully copied) come to exist, some will replicate 
more rapidly than others, leading to adaptation by natural selection. The great 
power of the idea is that the resulting adaptations can be understood by asking 
what leads to efficient, rapid replication. Given that ideas seem to replicate, it is 
natural that Dawkins (1976, 1982), Dennett (1995), and others have explored 
the possibility of using this idea to explain cultural evolution. 

Natural selection was not Darwin’s only powerful, far-reaching idea. Ernst 
Mayr (1982) has argued that what he calls “population thinking” was also among 
Darwin’s foundational contributions to biology. Before Darwin, species were 
thought to be essential, unchanging types, like geometric figures and chemical 
elements. Darwin saw that species were populations of organisms that carried a 
variable pool of inherited information through time. To understand the evolution 
of species, biologists had to account for the processes that changed the nature of 
that inherited information. Darwin thought that the most important processes 
were natural selection, sexual selection, and the “inherited effects of use and dis- 
use.” We now know that the last process is not important in organic evolution — 
unlike Darwin, modern biologists do not believe that the sons of blacksmiths 
inherit their father’s mighty biceps. Nowadays biologists think many processes 
that Darwin never dreamed of are important, including segregation, recom- 
bination, gene conversion, and meiotic drive. Nonetheless, modern biology is 
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fundamentally Darwinian because its explanations of evolution are rooted in 
population thinking. If Darwin were to be resurrected tomorrow through some 
miracle of cloning, we think he would be quite happy with his legacy. 

In this chapter we want to convince you that population thinking, not 
natural selection, is the key to conceptualizing culture in terms of material 
causes. This argument is based on three well-established facts: 

1 . There is persistent cultural variation among human groups. Any ex- 
planation of human behavior must account for how this variation 
arises and how it is maintained. 

2. Culture is information stored in human brains. Every human culture 
contains vast amounts of information. Important components of 
this information are stored in human brains. 

3. Culture is derived. The psychological mechanisms that allow culture 
to be transmitted arose in the course of hominid evolution. Culture 
is not simply a by-product of intelligence and social life. 

Much of culture is information stored in human brains — information that 
got into those brains by various mechanisms of social learning. It follows that 
to explain the distribution of information stored in the brains of the members 
of the current generation, any coherent theory will have to account for the 
cultural information in the brains of the previous generation. The theory will 
also have to explain how this information, together with genes and environ- 
mental contingencies, caused the present generation to acquire the cultural 
information that it did. Unfortunately, we do not understand how this process 
works. It may be that cultural information stored in brains takes the form of 
discrete memes that are replicated faithfully in each subsequent generation, or 
it may not. This is an empirical question that at present is unanswered, and 
we will see that other models are possible. In every case, the Darwinian popu- 
lation approach will illuminate the process by which the cultural information 
that is stored in a population of brains is transformed from one generation to the 

We also want to convince you that population thinking can play an im- 
portant, constructive role in the human sciences. The fact that population 
thinking is logically necessary for a natural, causal, theory of culture does not 
necessarily mean that such a theory will be useful. Thus, we know that human 
culture must be consistent with quantum mechanics, but it is unlikely that such 
a connection will help us understand, say, ethnic conflict. However, we think 
Darwinian models of culture are useful for two reasons. First, they serve to 
connect the rich models of behavior based on individual action developed in 
economics, psychology, and evolutionary biology with the data and insights of 
the cultural sciences, anthropology, archaeology, and sociology. In doing so, we 
think that they can help shed light on important unsolved problems in the social 
sciences. Second, population thinking is useful because it offers a way to build 
a mathematical theory of human behavior that captures the important role of 
culture in human affairs. Population thinking is not a universal acid that will 
dissolve existing social sciences. But it is a better mousetrap, providing useful 
new tools that can help solve outstanding problems in the human sciences. 
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Culture Is Heritable at the Group Level 

One of the striking facts about the human species is that there are important, 
persistent differences between human groups that are created by culturally 
transmitted ideas, not genetic differences, or differences in the physical or biotic 
environment. Sonya Salamon’s [1992) research on immigrant communities in the 
United States shows how cultural differences can give rise to different behaviors 
in the same environment. One of Salamon’s studies focused on two farming 
communities in southern Illinois. “Freiburg” (a pseudonym), is inhabited by the 
descendants of German-Catholic immigrants who arrived in the area during the 
1840s. “Libertyville” (also a pseudonym) was settled by people from other parts 
of the United States — mainly Kentucky, Ohio, and Indiana — when the railroad 
arrived in 1870. These two communities are only about 20 miles apart and have 
been carefully matched for similar soil types. 

The people in these two communities have different values about family, 
property, and farm practice, and these differences seem consistent with their 
ethnic origins. The farmers of Freiburg tend to value farming as a way of life, and 
they want at least one son or daughter to continue as a farmer. In Freiburg, wills 
specify that the farm will go to a child who will farm the land and use farm 
proceeds to buy out any nonfarming siblings. Parents put considerable pressure 
on children to become farmers. They place little importance on education, 
knowing that advanced education often results in young people not returning to 
the farm. Salomon argues that these “yeoman” values are similar to those ob- 
served among peasant farmers in Europe and elsewhere. In contrast, the “Yan- 
kee” farmers of Libertyville regard their farms as profit-making businesses. They 
buy or rent land depending on economic conditions, and if the price is right, they 
sell. Many Yankee farmers would prefer their children to continue farming, but 
they see it as an individual decision. Some families help their children enter 
farming, but many do not, and they generally place a strong value on higher 
education. 

The difference in values between Freiburg and Libertyville lead to mea- 
surable differences in farm practices despite the proximity of the two towns and 
the similarity of their soils. Farms are substantially larger in Libertyville — the 
mean size of farm operations in Libertyville is 518 acres compared to 276 acres 
in Freiburg. The Libertyville farms are larger because Yankee farmers rent more 
land. They rent more land because Yankees demand a higher income to stay in 
farming. Yeomen, who so value farming for its own sake, are content with lower 
incomes and fear the risks of debt-financed expansion. 

The two communities also show striking differences in farm operations. In 
Libertyville, as in most of southern Illinois, farmers specialize in grain produc- 
tion. It is the primary source of income for 77 percent of the farmers in Liber- 
tyville. In Freiburg, many people mix grain production with dairying or livestock 
raising, activities that are almost absent in Libertyville. Because animal husbandry 
is labor-intensive, it allows Germans to accommodate their larger families on 
their more limited acreage. Yankee farmers decided against dairying and stock 
raising because grain farming is more profitable and less work. 
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The fact that culturally distinctive human groups behave differently in the 
same environment implies that culture is heritable, at least at the group level. 
Many beliefs and values that are common in a group at one point in time are also 
common among the descendants of the same group. Any theory of how culture 
works must be consistent with this fact. It must explain why the German farmers 
of Freiburg hold different beliefs about life and land than their Yankee neighbors 
almost 150 years after leaving Europe. 


Culture Is Information in Stored Human Brains 

Every human culture contains an enormous amount of information. Consider 
how much information must be transmitted to maintain a particular distinctive 
spoken language. A lexicon requires something like 10,000 associations between 
words and their meanings. Grammar entails a complex set of rules regulating 
morphosyntax, and although it is unclear the extent to which these rules arise 
from innate, genetically transmitted structures, it is clear that the rules that 
underlie the grammatical differences that separate English and Chinese are cul- 
turally transmitted. Subsistence techniques also entail large amounts of infor- 
mation. For example, Blurton-Jones and Konner (1976) showed that the IKung 
San have a very detailed knowledge of the natural history of the Kalahari — so 
detailed, in fact, that the researchers were unable to judge the accuracy of much 
of IKung knowledge because in some aspects it exceeded Western biology. As 
anyone who has ever tried to make a decent stone tool can attest, the manu- 
facture of even the simplest tool requires lots of knowledge; more complex 
technologies require even more. Imagine the instruction manual for constructing 
a seaworthy kayak from materials available on the North Slope of Alaska. The 
institutions that regulate social interactions incorporate still more information. 
Property rights, religious custom, roles, and obligations all require a considerable 
amount of detailed information. 

The vast store of information that exists in every culture cannot simply float 
in the air. It must be encoded in some material object. In societies without 
widespread literacy, the most important objects in the environment capable of 
storing this information are human brains and human genes. It is undoubtedly 
true that some cultural information is stored in artifacts. It may well be that the 
designs that are used to decorate pots are stored on the pots themselves and that 
when young potters learn how to make pots they use old pots, not old potters, as 
models. In the same way, the architecture of the church may help store infor- 
mation about the rituals performed within. Without writing, however, the 
ability of artifacts to store culture is quite limited. First, many artifacts are very 
difficult to reverse-engineer. The young potter cannot learn how to select clay 
and temper or how to fire a pot by studying existing ones. Second, much cultural 
information is semantic knowledge — how can an artifact store the notion that 
Kalahari porcupines are monogamous? Or the rules that govern bride-price 
transactions? 

It is also clear that much cultural information is not stored in human genes. 
In one sense this is obvious. The evidence is very clear that very little cultural 
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variation results from genetic differences. We know that genetic differences do not 
explain why some people speak Chinese and others English, or why the IKung 
know a lot more about the biology of porcupines than most readers of this chapter. 

However, there is a subtle and much more plausible way that genes could 
store cultural information. It could be that most human culture is innate, ge- 
netically transmitted information that is evoked by environmental cues. Pascal 
Boyer (1994) argues that much of religious belief has this character. For ex- 
ample, the Fang, a group Boyer studied in Cameroon, have elaborate beliefs 
about ghosts. For the Fang, ghosts are malevolent beings that want to harm the 
living; they are invisible and can pass through solid objects, and so on. Boyer 
argues that most of what the Fang believe about ghosts is not culturally trans- 
mitted; rather, it is based on the innate, epistemological assumptions that un- 
derlie all cognition. Once a young Fang child learns that ghosts are sentient 
beings, she does not need to learn that ghosts can see or that they have beliefs 
and desires — these components are provided by cognitive machinery that reliably 
develops in every environment. According to this view, cultural differences arise 
because different environmental cues evoke different innate information. A 
friend of ours believes in angels instead of ghosts because he grew up in an 
environment in which people talked about angels. However, most of what he 
knows about angels comes from the same cognitive machinery that gives rise to 
Fang beliefs about ghosts, and the information that controls the development of 
this machinery is stored in the genome. 

This picture of culture is a useful antidote to the simplistic view that culture 
is simply poured from one head into another. Evolutionary psychologists are 
surely right that every form of learning, including social learning, requires an 
information-rich innate psychology and that much of the adaptive complexity 
we see in cultures around the world stems from this information. However, it 
is a big mistake to ignore transmitted cultural information. The single most 
important adaptive feature of culture is that it allows the gradual, cumulative 
assembly of adaptations over many generations — adaptations that no single in- 
dividual could invent on his own. Cumulative adaptation cannot be based solely 
on innate, genetically encoded information. 

Consider the evolution of a relatively simple form of technology, the mar- 
iners’ magnetic compass (Needham, 1978). First, Chinese geomancers noticed 
the peculiar tendency of small magnetite objects to orient in the earth’s magnetic 
field, an effect that they used for purposes of divination. Then, Chinese mariners 
learned that magnetized needles could be floated on water to indicate direction 
at sea. Next, over several centuries Chinese seamen developed a dry compass 
mounted on a vertical pin-bearing, like a modern toy compass. Europeans ac- 
quired this type of compass in the late medieval period. European seamen then 
developed the fixed card compass that allowed a helmsman to steer an accurate 
course by aligning the bow mark with the appropriate compass point. Compass 
makers later learned to adjust iron balls near the compass to zero out the 
magnetic influence from the ship and to gimbal the compass and fill it with 
liquid to damp the motion imparted to the card by the roll and pitch of the ship. 
Even such a relatively simple tool was the product of at least seven or eight in- 
novations separated in time by centuries and in space by the breadth of Eurasia. 
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This sort of adaptation occurs only because novel information can accumulate in 
human populations, be stored in human brains, and be transmitted through time 
by teaching and imitation. 

Evolutionary psychologists argue that our psychology is built of complex, 
information-rich, evolved modules that are adapted for the hunting and gathering 
life that we pursued until the origins of agriculture a few thousand years ago. On 
this argument, humans can easily and naturally do the things we are really adapted 
to do like learn a language or understand the feelings of others. Inventing complex 
modern artefacts like the compass is hard, but what about skills necessary for 
hunting and gathering? Couldn’t we learn these as easily as we learn language? 
Doesn’t our brain contain the information necessary to follow hunting and 
gathering ways? Our ancestors lived as hunter-gatherers of some kind for the last 
2 or 3 million years. If we had to do so, couldn’t we reinvent that stuff, just as Fang 
children invent the properties of their ghosts, or children can invent a grammar? 

Good questions, but we think the answer is almost certainly ' 'Are you nuts?! ’ ’ 
Consider the following thought experiment. Suppose you are stranded in some 
not-too-extreme desert environment, not the Empty Quarter or the Atacama, but 
the desert between Sonoita, Mexico, and Yuma, Arizona. Your task is to survive 
and raise your kids without modem technology. You will be given the resources to 
survive a few months to get your feet on the ground before we take away your last 
tin of food and your last steel tool — a little time to see what comes naturally. Will 
you make it? 

We don’t think so. The stretch between Sonoita and Yuma is known as El 
Camino del Diabolo, “the Devil’s Road.” It was one leg of the main overland 
route from Old Mexico to California until the coming of railroads. For more 
than a century it was used by Spanish, Mexican, and American travelers. To get 
that far, every traveler had to already be an experienced frontiers-person, and 
no doubt most were hardbitten, desert-wise, and well equipped with familiar 
technology. It was the best of several bad routes and was comparatively well 
known and well marked. Still, it was an infamous leg of the journey, and many 
travelers ended up in the hasty graves that litter the route. 

Now, consider that the Camino del Diabolo was also the home to Papago 
Indians who, with a few pounds of wood, stone, and bone equipment, an im- 
pressive amount of hard-won knowledge, and a well-adapted system of social 
institutions, lived and raised their children in the very same desert that killed so 
many pioneers. If our task was to survive in this desert without our accustomed 
industrial technology, we would certainly trade a few hours of tutoring by a 
traditional Papago for any number of months trying to summon an innate 
knowledge of the desert. 


Culture Is Derived 

Simple forms of social learning, often termed “protoculture,” occur in many 
other species of animals. In a review of the social transmission of foraging 
behavior, Levebre and Palameta (1988) give 97 examples of protocultural var- 
iation in foraging behavior in animals as diverse as baboons, sparrows, lizards, 
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and fish. Much of the evidence for protoculture in other animals consists of 
observations of different behavior by populations of the same species living in 
similar environments. For example, chimpanzees in the Mahale Mountains of 
Tanzania often adopt a unique grooming posture in which both partners extend 
one arm over their heads, clasp hands, and then groom one another’s exposed 
arm pits. These grooming hand-clasps occur often and are performed by all 
members of the group. Chimpanzees at Gombe, who live less than 100 kilo- 
meters away in a similar type of habitat, often groom but never perform this 
behavior. Sometimes scientists have observed the spread of a novel behavior. 
One famous example comes from Japan where a group of Japanese macaques, 
whose range included a sandy beach, were provisioned with sweet potatoes. 
A young female macaque accidentally dropped her sweet potato into the sea 
as she was trying to rub the sand off it. She must have liked the result, as she 
began to carry all of her potatoes to the sea to wash them. Other monkeys 
followed suit. However, it took other members of the group quite some time to 
acquire the behavior and many monkeys never washed their potatoes. Finally, 
some evidence for protoculture in other animals comes from experiments that 
demonstrate that behavior is socially transmitted. The most famous case is the 
transmission of song dialects in birds like the white-crowned sparrow. 

There is little evidence, however, of cumulatively evolved cultural traditions 
in other species. With a few exceptions, social learning leads to the spread of 
behaviors that individuals could have learned on their own. For example, food 
preferences are socially transmitted in rats. Young rats acquire a preference for a 
food when they smell the food on the pelage of other rats (Galef, 1988], This 
process can cause the preference for a new food to spread within a population. It 
can also lead to behavioral differences among populations living in the same 
environment, because current foraging behavior depends on a history of social 
learning. However, it does not lead to the cumulative evolution of complex new 
behaviors that no individual rat could learn on its own. Thus, in other animals it 
is quite plausible that most of the detailed information that creates protocultural 
differences is stored and transmitted genetically. 

Circumstantial evidence suggests that the ability to acquire novel behaviors 
by observation is essential for cumulative cultural change. Students of animal 
social learning distinguish observational learning, which occurs when younger 
animals observe the behavior of older animals and learn how to perform a novel 
behavior by watching them, from a number of other mechanisms of social trans- 
mission, which also lead to behavioral continuity without observational learning 
(Galef, 1988; Visalberghi and Fragazy, 1990; Whiten and Ham, 1992}. One 
such mechanism, local enhancement, occurs when the activity of older animals 
increases the chance that younger animals will learn the behavior on their 
own. Imagine a young monkey acquiring its food preferences as it follows 
its mother around. Even if the young monkey never pays any attention to what 
its mother eats, she will lead it to locations where some foods are common 
and others rare, and the young monkey may learn to eat much the same foods 

Local enhancement and observational learning are similar in that they 
can both lead to persistent behavioral differences among populations, but only 
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observational learning allows cumulative cultural change (Tomasello, Kruger, and 
Ratner, 1993}. To see why, consider the cultural transmission of stone tool use. 
Suppose that occasionally early hominids learned to strike rocks together to 
make useful flakes. Their companions, who spent time near them, would be 
exposed to the same kinds of conditions, and some of them might learn to make 
flakes too, entirely on their own. This behavior could be preserved by local 
enhancement because groups in which tools were used would spend more time 
in proximity to the appropriate raw materials. However, that would be as far as 
tool-making would go. Even if an especially talented individual found a way to 
improve the flakes, this innovation would not spread to other members of the 
group because each individual learned the behavior anew, without any detailed 
guidance from innovators who have improved on the common technique. Local 
enhancement is limited by the learning capabilities of individuals and the fact 
that each new learner must start from scratch. With observational learning, 
on the other hand, innovations can be incorporated into others’ behavioral rep- 
ertoires if younger individuals are able to acquire the improved behavior by 
observational learning. To the extent that observers can use the behavior of 
models as a starting point, observational learning can lead to the cumulative 
evolution of behaviors that no single individual could invent on its own. 

Adaptation by cumulative cultural evolution is apparently not a by-product 
of intelligence and social life. Capuchin monkeys are among the world’s cleverest 
creatures. They resemble apes in having quite large brains for their size. In na- 
ture, they perform many complex behaviors, and in captivity they can be taught 
extremely demanding tasks. Capuchins live in social groups and have ample 
opportunity to observe the behavior of other individuals of their own species. 
Yet good laboratory evidence indicates that these monkeys make little or no use 
of observational learning (Visalberghi and Fragazy, 1990}. Observational learn- 
ing is not simply a by-product of intelligence and the opportunity to observe 
conspecifics. Rather, it seems to require special psychological mechanisms 
(Bandura, 1986}. This conclusion suggests that the psychological mechanisms 
that enable humans to learn by observation are adaptations that have been 
shaped by natural selection in the human lineage because culture is beneficial. 


Cultural Evolution Is Darwinian 

Now, let us consider what these facts imply for a theory of culture. Consider a 
population of individuals who are culturally interconnected; they speak dialects 
of a single language, use similar technology, share relatively similar beliefs about 
the world, and have similar moral values. People in this population think and 
behave differently from other peoples, in part, because they have different 
culturally transmitted information stored in their brains. Next consider the 
descendants of this population, say 100 years later. The culture of the descen- 
dant population will be similar in many ways to that of their predecessors. Their 
language will be similar, and they may often use similar technology, have similar 
beliefs about the world, and subscribe to a similar moral system. The fact that 
culture depends on behavior stored in the brains of this population requires us to 
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account for how the information that generates these similarities was transmit- 
ted from the brains in the first population to the brains in the second. 

Of course, there will also be differences between the two populations, some 
small, some great. Some of these differences will arise because some behaviors 
are more common in the second population — for example, perhaps what was 
previously a rare usage or form of pronunciation has become common. Other 
differences will arise because genuinely new behavior is present, either as a result 
of borrowing from neighboring populations or due to genuine innovation. Thus, 
a complete theory would also have to account for why some forms of cultural 
information spread, and why some forms have diminished, and how innovation 

Cumulative cultural change requires observational learning. People observe 
the behavior of others, and (somehow] acquire the information necessary to 
produce a reasonable facsimile of the same behavior. In any given time period, 
each person observes only a sample of the people who make up his population. A 
very small child is exposed mainly to the people in her family, older children are 
exposed to peers and teachers, and adults to yet a wider range of people. We will 
refer to this group of people as an individual’s “cultural sample.” For most of 
human history cultural samples were small, but nowadays they may be immense. 
On the other hand, for some elements of culture many people may be dispro- 
portionately influenced by a single charismatic leader or acknowledged expert. 

The fact that cultures often persist over time with little change means that 
the commonness of a behavior in an individual’s cultural sample must have a 
positive effect on the probability that the individual ultimately acquires the 
cultural information that generates that behavior. Such a tendency could arise in 
several different ways: if observational learning takes the form of approximately 
unbiased copying, then common behaviors will be more frequent in cultural 
samples, and therefore will be more likely to be copied. It could also be that the 
psychology of observational learning itself predisposes people to acquire more 
common behaviors. Finally, it could be that rare behaviors are typically disad- 
vantageous and less likely to be retained as a result of individual learning and 
experimentation, or even by natural selection against them. 

It follows that cultural change is a population process. The argument pro- 
ceeds in several steps: 

• To understand how a person behaves, we have to know the nature of 
the information stored in her brain 

• To understand why people have the beliefs that they do, we must 
know what kinds of behaviors characterized their cultural sample 

• To predict the distribution of cultural samples that exists, one must 
know the cultural composition of the population 

• Therefore, to understand how people behave, we must understand 
why the population has the cultural composition that it does 

Similarities between descendant and ancestral populations arise because the 
necessary information has been transmitted from individual to individual 
through time without significant change. Differences occur because some var- 
iants have become more common, others have become more rare, and some 
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completely new variants have been introduced. Thus, to account for both con- 
tinuity and change we need to understand the population processes by which 
ideas are transmitted through time. 


Culturally Transmitted Skills and Beliefs May 
Not Be Replicators 

In The Extended Phenotype, Richard Dawkins (1982) argues that the cumulative 
evolution of complex adaptations requires what he calls replicators, things in the 
physical world that produce copies of themselves and have the following 
three additional properties: 

1 . Fidelity. The copying must be sufficiently accurate that even after a 
long chain of copies the replicator remains almost unchanged. 

2. Fecundity. At least some varieties of the replicator must be capable 
of generating more than one copy of themselves. 

3. Longevity. Replicators must survive long enough to affect their own 
rate of replication. 

Replicators give rise to cumulative adaptive evolution because replicators 
are targets of natural selection. Genes are replicators — they are copied with 
astounding accuracy, they can spread rapidly, and they persist throughout the 
lifetime of an organism, directing its machinery of life. Dawkins thinks that 
beliefs and ideas are also replicators. On the face of it, this is an apt analogy. 
Beliefs and ideas can be copied from one mind to another, spreading through a 
population, controlling the behavior of people who hold them. 

But there are reasons to doubt that beliefs and skills are replicators, at least 
in the same sense that genes are. Unlike genes, ideas are not copied and trans- 
mitted intact from one brain to another. Instead, the information in one brain 
generates some behavior; somebody else observes this behavior and then (some- 
how) creates the information necessary to generate very similar behavior. The 
problem is that there is no guarantee that the information in the second brain is 
the same as the first. For any phenotypic performance, there are potentially an 
infinite number of rules that would generate that performance. Information will 
be transmitted from brain to brain only if most people induce a unique rule from 
a given phenotypic performance. While this may often be the case, it is also 
plausible that genetic, cultural, or developmental differences among people may 
cause them to infer different beliefs from the same overt behavior. To the extent 
that these differences shape future cultural change, the replicator model captures 
only part of cultural evolution. 

The generativist model of phonological change illustrates the problem. Ac- 
cording to the generativist school of linguistics, individual pronunciation is gov- 
erned by a complex set of rules that takes as input the desired sequence of words 
and produces as output the sequence of sounds that will be produced (Bynon, 
1977). Generativists also believe that, as adults, people can modify their pro- 
nunciation only by adding new rules that act at the end of the chain of existing 
rules. Children, on the other hand, are not constrained by the rules used to 
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generate adult speech. Instead, they induce the simplest set of grammatical rules 
that will account for the performances they hear, and these may be quite different 
than the rules used by adult speakers. Although the new rules produce the same 
performance, they can have a different structure and, therefore, allow further 
changes by rule addition that would not have been possible under the old rules. 

The following example (from Bynon, 1977] illustrates this phenomenon. In 
some dialects of English, people pronounce words that begin with wh using what 
linguists call an “unvoiced” sound while they pronounce words beginning with 
w using a voiced sound. (Unvoiced sounds are produced with the glottis open, 
resulting in a breathy sound, whereas voiced sounds are produced with the 
glottis closed, causing a resonant tone.) People who speak such dialects must 
have mental representations of the two sounds and rules to assign them to 
appropriate words. Now suppose that people who speak such a dialect come into 
contact with other people who only use the voiced w sound. Further suppose 
that this second group of people is more prestigious, and accordingly people in 
the first group modify their speech so that they too use only voiced ws. Ac- 
cording to the generativists, they will accomplish this change by adding a new 
rule that says “voice all unvoiced ws.” So, Larry wants to say Whether it is better 
to endure. The part of his brain that takes care of such things looks up the mental 
representations for each of the words, including whether, which has an unvoiced 
w (because that is the way Larry learned to speak as a child). Then after any 
other processing for stress or tone, the new rule changes the unvoiced w in 
whether to a voiced w. Children learning language in the next generation never 
hear an unvoiced w, and, according to generativists, they adopt the same under- 
lying representation for whether and weather. Thus, even though there is no dif- 
ference in the phenotypic performance among parents and children, children do 
not acquire the same mental representation as their parents. This difference may 
be important because it will affect further changes. For example, it might make 
it less likely that the two sounds would split again in the future. The adult 
version of the rule still has a latent distinction between the voiced and unvoiced 
pronunciation that could serve as the basis for renewing the distinction, whereas, 
if the generativists are correct, the latent distinction is unavailable to child 
learners who hear only one usage. 


Replicators Are Not Necessary for Cumulative 
Adaptive Evolution 

We also doubt that replicators are necessary for the cumulative evolution of 
complex features. Here is an example of a transmission system that does just 
that. When you speak, the kind of sounds that come out of your mouth depends 
on the geometry of your vocal tract. For example, the consonant p in spit is 
created by momentarily bringing your lips together with the glottis open. Nar- 
rowing the glottis converts this consonant to b as in bib. Leaving the glottis open 
and slightly opening the lips produces pf, as in the German word apfel (apple). 
Linguists have shown that even within a single speech community individuals 
vary in the exact geometry of the vocal tract used to produce any given word. 
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Thus, it seems plausible that individuals vary in the culturally acquired rule 
about how to arrange the inside of the mouth when they are speaking any par- 
ticular word. Languages vary in the sounds used and this variation can be very 
long-lived. For example, in dialects spoken in the northwest of Germany, p is 
substituted for pf in apfel and many similar words. This difference arose about AD 
500 and has persisted ever since (Bynon, 1977]. 

So how are different rules governing speech production transmitted from 
generation to generation? Consider two models. 

First, suppose that each child learning language is exposed to the speech of a 
number of adults. These adults vary in the way that they produce the pf sound in 
apfel. Each child figures out how she would need to position her tongue to 
produce the same pf sound as each adult model, and then she adopts one of these 
as her own rule. Here, a mental rule that governs speech production is trans- 
mitted from one individual to another. The mental rule is a replicator; it clearly 
has fidelity. It has longevity because it potentially persists for generations, and it 
would have fecundity if the rule was more attractive than competing rules. And 
because it is a replicator, it can evolve. 

Now consider a second model. As before, children are exposed to the speech 
of a number of adults who vary in the way that they pronounce pf. Each child 
unconsciously computes the average of all the pronunciations that he hears and 
adopts the tongue position that produces this average. Here, mental rules are not 
transferred from one brain to another. The child may adopt a rule that is unlike 
any of the rules in the brains of his models. The rules in particular brains do not 
replicate because no rule is copied faithfully. The phonological system can 
nonetheless evolve in a quite Darwinian way. More attractive forms of pro- 
nunciation can increase if they have a disproportionate effect on the average. 
Rules affecting different aspects of pronunciation can recombine and thus lead to 
the cumulative evolution of complex phonological rules. It is true that the act of 
averaging will tend to decrease the amount of variation in the population each 
generation. However, phenotypic performances will vary as a result of age, social 
context, vocal tract anatomy, and so on. Learners will often misperceive an 
utterance. These sorts of errors in transmission will keep pumping variation into 
a population as averaging bleeds it away. In fact, averaging might be necessary to 
prevent high noise levels from injecting too much variation into the population 
(see Cavalli-Sforza and Feldman, 1981; Boyd and Richerson, 1985]. 

There are still other possibilities that differ even more radically from the 
replicator model. For example, a propensity to imitate the common type in the 
population can be coupled with high rates of individual learning to create a model 
in which there is little heritable variation at the individual level, but substantial 
heritability of group differences (Henrich and Boyd, 1998], In such a model 
the cumulative evolution of adaptive complexity can occur, and occur rapidly, 
through selective processes that act at the group level (Boyd and Richerson, 1 990, 
2002]. Similarly, in recent models of the evolution of social institutions (Young, 
1998], there is no cultural transmission at the individual level. Although in- 
dividuals simply acquire the best response to their social environment by trial- 
and-error learning, the structure of social interactions creates persistent, heritable 
variation at the group level. 
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We do not understand in detail how culture is stored and transmitted, so we 
do not know whether culturally transmitted ideas and beliefs are replicators or 
not. If the application of Darwinian thinking to understanding cultural change 
depended on the existence of replicators, we would be in trouble. Fortunately, 
culture need not be closely analogous to genes. Ideas must be gene-like to the 
extent that they are somehow capable of carrying the cultural information nec- 
essary to give rise to the cumulative evolution of complex cultural patterns that 
differentiate human groups. They exhibit the essential Darwinian properties of 
fidelity, fecundity, and longevity, but, as the example of phonemes shows, this 
can be accomplished by a most ungene-like, replicatorless process of error-prone 
phenotypic imitation. All that is really required is that culture constitutes a sys- 
tem maintaining heritable variation. 


Darwinian Models Are Useful 

Science on the frontier often has an anarchic, nervy flavor because it must deal 
with multiple uncertainties. Of course, we would be better off knowing exactly 
what memes are. Papering over the uncertainties of how culture is stored and 
transmitted no doubt leads to errors and conceals areas of fruitful inquiry. But as 
the psychologists explore one part of the frontier, the evolutionists should probe 
others. Studying the population properties of cultural information has lots of 
implications for human cognitive psychology, and vice versa. For example, when 
a child has the chance to copy the behavior of several different people, does she 
choose a single model for a given, discrete cultural attribute? Or does she av- 
erage, or in some other way combine, the attributes of alternative models? The 
minute you try to build a population model of culture, you see that this question 
is crucial. However, despite conducting thousands of experiments on social 
learning, psychologists apparently have never thought to answer this question. 
Just as at a four-way stop, it makes no sense for everyone to wait for everyone 
else. Watch what the other drivers are doing, certainly, but go whenever the road 
ahead is clear. 

Many social scientists have reacted to the advent of Darwinian models of 
culture with palpable distaste [e.g., Hallpike, 1986), while others have embraced 
these ideas with enthusiasm (e.g., Runciman, 1998). Much of this variation can 
be explained by people’s feelings about the current Balkanization of the social 
sciences. The world of social science is divided into self-sufficient “ethnies” like 
anthropology and economics that are content to follow the questions and pre- 
suppositions that govern their discipline. The inhabitants of this world regard 
other disciplines with a mixture of fear and contempt and take little interest in 
what they have to say about questions of mutual interest. Clearly, this is not a 
satisfactory state of affairs. 

We believe that Darwinian models can help rectify this problem. Disciplines 
such as economics, psychology, and evolutionary biology take the individual as 
the fundamental unit of analysis. These disciplines differ about how to model the 
individual and his psychology, but because they have the same fundamental 
structure, there has been much substantive interaction between them. Nowadays, 
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many economists and psychologists work closely together, and a rich new body 
of work, often called “behavioral economics,” has rapidly become mature 
enough to be applied to important practical problems such as the effect of 
retirement accounts on national savings rates. In the same way, economists and 
evolutionary biologists have found it relatively easy to work together on evolu- 
tionary models of social behavior, a rapidly growing field in both disciplines. 

Other disciplines like cultural anthropology and sociology emphasize the 
role of culture and social institutions in shaping behavior, and researchers in 
sociology, anthropology, and history find interaction with each other relatively 
comfortable. Bridging the gap between the individual and cultural disciplines has 
proved much more difficult. Darwinian models are useful precisely because they 
incorporate both points of view within a single theoretical framework in which 
individuals and culture are articulated in a way that captures some, if not all, of 
the properties that their respective specialists claim for them. In population- 
based models, culture and social institutions arise from the interaction of in- 
dividuals whose psychology has been shaped by their social milieu. As a bonus, 
Darwinian models come with tools to investigate the population-wide, long- 
term consequences of the interactions between individuals and their culture and 
social institutions. 

To see how useful population-based models can be, consider the problem of 
human cooperation. There is no coherent explanation for the vast scale of co- 
operation in contemporary human societies, or why the scale of cooperation has 
increased many 1000-fold over the last 10,000 years. Models in economics and 
evolutionary biology predict that cooperation should be limited to small groups 
of relatives and reciprocators. Many theories in anthropology simply assume 
(often implicitly] that cooperative societies are possible and that culturally 
transmitted beliefs and social institutions serve the interest of social groups, but 
no attempt is made to reconcile this assumption with the fact that people are at 
least partly self-interested. Darwinian models provide one cogent mechanism to 
explain human cooperation by identifying the conditions under which groups will 
come to vary culturally and predicting when such variation will lead to the spread 
of culturally transmitted beliefs that support large-scale cooperation (Soltis, 
Boyd, and Richerson, 1995], In such models, the effect of different culturally 
transmitted beliefs on group prestige and group survival shapes the kinds of 
beliefs that survive and spread. These group-level effects in turn influence what 
people want and what they believe and, therefore, their behavior. Other recent 
work on the evolution of institutions (Young, 1998; Richerson and Boyd, 2002) 
makes us optimistic that Darwinian models may have widespread utility. 

Population thinking is also useful because it offers a way to build mathe- 
matical theory of human behavior that captures the important role of culture in 
human affairs. Mathematical theory has the great advantage of allowing con- 
clusions to be reliably deduced from assumptions. Experience in economics and 
evolutionary biology also suggests that it leads to a kind of clear understanding 
that is difficult to achieve with verbal reasoning alone. Of course there is also a 
cost — mathematical theory is necessarily based on simplified models. However, 
the combination of mathematical and verbal reasoning is superior to either 
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Memes are not a universal acid, but population thinking is a better 
mousetrap. Population modeling of culture offers social science useful concep- 
tual tools and handy mathematical machinery that will help solve important, 
long-standing problems. It is not a substitute for rational actor models, or careful 
historical analysis. But it is an invaluable complement to these forms of analysis 
that will enrich the social sciences. 
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group beneficial traits, 143, 229-238 
success or prestige based, 17, 107, 111, 
120, 190, 229-238 
Big game hunting, 262 
Big-man equilibrium, 139 
Biological species concept, 315 
Birdsong dialects, 52, 54, 55, 76, 426 
Birth limitation a redefinition of carrying 
capacity, 365-6 
Blackfeet people, 263 
Blending inheritance, variance in, 431 
Body fat in prey as constraint on diet, 358 
Bokondini-Dani people, 212 
Bolling-Allerod warm event, 348, 350 
Brain size increase 
costs of, 72-3, 76, 79 
evolution in Neogene, 68 
in mammals, 1 7 
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Brain specialized for species niche, 69 
Brazil climate records, 343 
Burden of proof claims usually harmful, 413 
Bureaucracy, 221, 267, 269 

California, 324, 341, 345, 356, 359-60 
Cameroon, Fang people in, 424 
Camino del Diabolo, survival in, 425 
Canada, Athapaskan languages in, 324 
Capacities 

for complex culture cannot increase when 
rare, 78 

organic that make culture possible, 1 06 
Capuchin monkeys, 427 
Carbon dioxide limitation of photo- 
synthesis, 346-8 
Cariaco Basin core, 343 
Catholic Church, 265, 268, 329 
Cebus monkey, 56 
Cetaceans, 79 

Charismatic innovators, 268 
Checkerspot butterfly, 315 
Chimbu people, 212 
Chimpanzees, 52, 54, 55, 77, 426 
China, spread of agriculture in, 357 
China climate record, 343 
Chinese development of magnetic compass, 
424 

Choice as cultural evolutionary force, 255-6 

Chomskian linguistics, 264 

Christianity, 84 

Climate and climate variation. 

See Agriculture, origins of; 
Environmental variability; Millennial 
and sub-millennial scale variation; 
Pleistocene climates 
Coercion as work-around, 262-3 
Coevolution. See Gene-culture coevolution 
Cognitive adaptations 
costly, 66 

general versus special purpose, 68-70 
modules as, 425 
Cognitive complexity 

an adaptation to variable environments, 

66 

great scientific puzzle, 72-5 
Cognitive economics, 72-5 
Cognitive style, 273 
Coherence of cultural elements 
linguistic examples, 328 
meaning, 320 
processes favoring, 320-2 
small units (examples) 328-30 
Collective decision-making institutions, 221 
Command-and-control hierarchy, 266 
Comparative method, 332-3 
Competitive ratchet, 339, 349, 358 
Complex cumulative culture, 13, 66, 78 
Complex models critiqued, 377, 402-3 
Complexity of nature and culture, 377, 398 
utility of simple models to understand, 
402-4 
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Conflict 
intergroup, 261 
intracultural, 265 

Conformity bias (conformist transmission) 

142. See also Biased cultural 
transmission 

can favor genetic maladaptations, 192 
can stabilize punishment, 190-2 
evolution of, 30-2 
leads to adaptive norms, 85-91 
maintains variation between groups, 207 
Confucianism, 266 

Contemporary climate change, impact on 
agriculture, 365 
C ontrada, wards and Palio, 267 
Contrite tit-for-tat, 136 
Cooperation 

contingent on many things, 253 
costly signaling explanation, 270-1 
culture-based favors prosocial genes, 199-200 
Darwin’s hypothesis, 251 
definition of, 274 

empirical test of cultural group selection 
explanation, 204 

evolution favored by punishment, 
disfavored by high rates of mixing, 

244-5 

evolution of contingent, 135-141 
evolution of in sizable groups, 146 
explanation of tightly constrained by 
facts, 270 

human cooperation in large groups of 
non-relatives unique, 166, 189, 242, 
251-4, 395 

human cooperation major scientific 
problem, 433 
institutions matter, 253 
a result of human intelligence, 273 
review of theories of, 252-9 
within groups leads to conflict between 
groups, 101, 133 

Cooperative institutions, evolution of, 

260-70 

Coordination, 85, 119 
tends to produce multiple stable 
equilibria, 300-1 
Corvids, 79 

Costly information. See Information, costly 
Costly signals, 270-1 
Cuisine, 99 

Cultural adaptation rapid and cumulative, 

143, 261 

Cultural component 
definition, 326 

historical linguistic examples of, 327-8 
Cultural descent with modification, 54 
Cultural ecologists, 5 
Cultural evolution 
definition of, 274-5, 339 
derived from Bayesian assumptions, 376 
dynamics of innovation, 353-5 
impossible to control, 269 


intellectual history of, 7 
is a population process, 428-9 
is Darwinian, 427-9 
isolating processes, 311 
Lamarckian, 256 

mechanisms leading ESS amount of 
imitation, 391-3 
origins of agriculture as a natural 
experiment in, 338 
path dependence important, 299 
processes regulating rate of, 360-1 

rate of, 143, 346 

sketch of Darwinian theory of, 399-402 
synthetic role of theory of, 433 
theory as a plausibility argument, 412 
theory of as tools for historians, 285 
timescales of, 355 

ultimate versus proximate role for, 259 
understanding using Darwinian methods, 
287 

Cultural explanations, prejudice against, 6 
Cultural group selection, 17, 134, 198, 
260-4, 274, 433 

and evolution of altruistic punishment, 
241-9 

how works, 206-8 

payoff based imitation form fast, 91-5, 
141-3, 229-38 

rate of in New Guinea highlands, 143, 
204, 218-21 

rate relatively slow, 228-9 
roles of fast and slow forms of, 239 
spreads cooperation and punishment, 1 92 
on subgroups within a society, 221 
Cultural inertia, 380 

Cultural meaning as force for coherence, 320 
Cultural phylogenies, 284 
in assemblages of coherent units, 318 
cultures as species, 317-8 
current practice for reconstructing, 324-6 
in hierarchically integrated cultural 
systems, 318 
reconstructing, 317-32 
when cultures are collections of 
ephemeral entities, 3 1 8-9 
Cultural recombination, 236-8 
difficulty for maintaining cultural 
coherence, 327 
Cultural transmission, 56-7 
component of model of ethnic 
boundaries, 111 
empirical evidence for, 208-22 
evolution of psychological capacities 
for, 58 

Cultural variation, 270, 421 

can respond to group selection, 134 
decision-making maintains, 220 
definition of, 53 

farming practices as example of, 422 
maintained by social enhancement versus 
imitation, 44 
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maintenance of group level, 206-7, 

213-4, 261 

model assumptions supported, 218 
not environmental variation, 53-4 
Culture 

allows humans to transcend evolutionary 
imperatives, 103 
analogy with genes, 377, 399 
can “domesticate” genes, 401 
can only be understood historically, 287 
common in nature, 52 
complexity of human traditions, 77 
creates novel evolutionary tradeoffs, 8 
a Darwinian evolutionary system, 4 
definition, 3, 6, 105, 252, 287 
derived not ancestral system, 425-7 
evolutionarily active, 104 
evolution of capacities for, 9, 52-3, 104 
as evolving (review of processes) 258-9 
how increases average fitness, 39 — 44 
improves human adaptability, 36-51 
information stored in human brains, 421, 
423-5 

maintains heritable variation, 432 
meaning as a force for coherence, 320 
neither autonomous nor prisoner of 
genetic constraints, 103 
not necessarily replicated, 429 
origins of, 399 
in other animals, 53-6, 76-8 
part of human biology, 4 
population thinking necessary to 
understand, 421 

as powerful adaptive system, 10 
role in evolution of human cognition, 70 
role of innate information in, 424 
a system of inheritance, 103, 389, 399, 422 
Cultures 

core principles, ultimate sacred 

postulates, or root paradigms of, 320 
history of cultures not “pure” 
hypotheses about structure of, 317-9 
in non-human animals, 425-7 
of organizations, 380 
phylogenies of, 284, 317-32 
population-like versus species-like, 311, 
333 

Cumulative cultural evolution, 49-50, 52, 

100 

adaptation to climate chaos, 143 
fast and frugal heuristics and complex 
adaptive behavior, 191 
important in humans, 54-5, 424, little 
evidence for in non-humans, 54 
origins of, 104 
Cushitic languages, 328-9 

Darwinian methods for study of cultural 
evolution, 258-9, 287, 339, 

378, 400 

Darwinian review of models of cultural 
evolution, 288-90 


Darwinian social science, 6 
Darwinian theory both scientific and his- 
torical, 287-8 
Datoga people, 329 

Decision-making forces, 290, 400. See also 
Biased cultural transmission 
Decision theory, 20, 33 
Democracy, 270 
Demographic transition, 366 
Denmark, spread of agriculture in, 360 
Descent in cultural evolution. See also 

Phylogenetic reconstruction of cultural 
descent 

Bantu political traditions as case, 325-6 
comparison of core and small unit 
hypotheses, 330 

of core traditions: evidence for, 322-6 
cultures as wholes: evidence for, 322 
Indo-European historical linguistics as 
case, 325 

mechanisms causing longevity and 
coherence, 319-22 
of memes, 331 

of small cultural components: evidence, 
326-30 

when impossible or uninteresting, 331-2 
Descent in organic evolution, 3 1 2-6 

common properties of genes and species, 
316-7 

of genes, 312-3 
of species, 315 

when phylogenies reticulated, 3 1 3-5 
Design complexity 

IBM 370 microprocessor as example, 
296-7 

number of qualitatively distinct 
equivalent optima, 296-9 
very large number of local optima, 

293-5 

Design tradeoffs in evolution of minds, 73 
Development, role of environment, 8 
Developmental constraints on responses to 
selection, 298 
Dialect, 99, 108 
Dialect evolution, 268 
Diffusion of innovations, 107, 323 
Dinka people, 221, 261 
Divergent evolution, versus convergent 
evolution, 292 

Diversity of cultural and natural processes, 
398 

favors use of toolkit of simple models, 
402-4 

Domestication 
bean, 357 
goosefoot, 356 
maize, 356-7, 360 
root crops, 357 
squash, 357 
sunflower, 356 
Dress, 99 

Dual inheritance theory, 103 



450 SUBJECT INDEX 


Dugum Dani people, 212 
Dynamism of plant and beetle populations 
in Pleistocene epoch, 345 

Eastern North America, spread of 
agriculture in, 356 
Eastern Woodland societies, 344 
Economic inequality, 266 
Economists, 376 
Efficiency, definition of, 364-5 
Egalitarian societies, 266 
Empirical tests of simple models, 404 
Encephalization, 68, 76, 79 
Engineers, 376 
English language, 327-8, 424 
Environment 

dimensionality very large, 70 
novel, 70 

Environmental variability. See also 
Pleistocene climates 
cultural adaptations to, 15 
extreme in glacial periods, 17 
favors evolution of social learning, 25-9, 
32 

Ethnic boundaries 

predictions about the nature of, 1 29-30 
testing model of, 113 
Ethnic markers 

acting to isolate cultures, 311 
requirements for the evolution of, 

122-8 

Ethnicity, 99, 118 
example of entwining of genes and 
culture, 104 
Ethnocentrism, 100 
of scientific disciplines, 415, 432 
Eurasia, isolation of cultures in, 334 
Europe, spread of agriculture in, 338, 359 
European nation-states, 266 
Evoked culture, 70 

as always multilevel, 256-8 
of genes controlling social learning, 24-32 
of complex cognition, 66 
logic of genetic and cultural similar, 
255-6 

of social learning, 21-32 
of social organization slow, 345 
of tribal social instincts, 260-4 
in variable environments, 25-9 
Evolutionarily stable strategy approach, 24 
Evolutionary biologists, 376 
Evolutionary biology as source of concepts 
and methods to study cultural 
evolution, 105 

Evolutionary equilibrium for reciprocity in 
large groups, 153-7 

Evolutionary mechanisms, as generating 
historical contingency, 284 
Evolutionary psychology, 424-5 
extreme version of information-rich 
modules argument implausible, 425 


Evolutionary social science, as a 
methodology consistent with 
many theories, 259-60 
Evolutionary social scientists on culture, 8 
Evolutionary theory 
as accounting system, 6 
of culture, 255-6 
not reductionistic, 377 
recmsive and multi-level, 255-8 
Experimental games, 271 
Explanations, hard versus soft, 6 
Exploitative elites, 266 
External versus internal explanations, 16 

Faiwolmin tribal area, 214-5 
Family level societies, 262. See also 
Small-scale societies 
Fang people, 424 

Female circumcision (genital mutilation) 

328, 329 

Fertile crescent, spread of agriculture in, 350 

Fish, 54, 294, 425 

Fitness 

malignant functions, 298 
maximizing models example of 
generalized sample theory, 405 
peak shifting on complex topographies, 

299-300 

topography metaphor, 295-7 
Flemish language group, 217 
Florida climate record, 343 
Folk theorem, 84, 135, 139, 178 
Folk wisdom, 394-5 
Food preferences socially transmitted in 
rats, 54 

Food taboos, 206 
Fore, 210, 212, 216 
France, Upper Paleolithic societies 
in, 262 

Free riders, second order, 140, 189, 247 
Freemasonry, 329 
French language, 328 
Functionalism, 84, 204, 251, 291 
ahistorical, 294 
limits to, 220 
Fundamentalist sects, 268 

Gahuku people, 212 
Galton’s problem, 332 
Game theory, 95 
Gebusi people, 216, 330 
Gene flow and phylogeny, 314 
Gene-culture coevolution, 4-5, 9, 116, 144, 
199-200, 254 

builds cultural imperatives into the genes, 
264 

genes “domesticated” by culture, 401 
led to evolution of tribal social instincts, 
2 6 3 — 4 

reshaped human nature, 270 
Generalized sample theories, 404-6 
kin selection an example, 410 
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General systems, versus purpose learning 
systems, 71-5 

Generativist model of phonological change, 
430-1 

Generous tit-for-tat, 138 

affecting cultural transmission, 58 
analogy with culture shallow, 377 
prosocial selected by cooperative cultural 
norms, 199-200 

Genetic constraints on responses to 
selection, 298-9 
Genetic value defined, 306 
Genetic variation, 53, 270 
German language, 327-8 
Germany, growth of democracy in, 270 
Ghosts, 424 
Global warming, 362 
Goeammer people, 216 
Gombe, chimpanzees at, 426 
Great Basin, 262-3, 333 
Great Plains, ecology and ceremony in, 

333 

Greco-Roman urban civilization imposed 
upon barbarians, 321 
Greek city-states, 267 
Greenland, climate variation in, 344 
Greenland ice cores, 340-4, 349 
Group beneficial strategy, conditions for 
spread, 233-6 

Group boundaries. See also Ethnic 
boundaries and related topics 
differences strongest at, 126 
permeable, 99 

Group explanations, versus individual level 
explanations, 5 
Group extinction, 209 
Group level cultural recombination, 

236-8 

Group selection, 141-3, 257. See also 
Cultural group selection 
cultural versus genetic, 249 
Darwin’s tribal group selection 
hypothesis, 251 
evidence against, 260 
interdemic, 241 

thought plausible in human case by 
prominent evolutionists, 275 
Gunwingga people, 275 
Gusii people, 329 

Han China, 266 
Hapsburgs, 329 

Hawaii, population build-up in, 359 
Heuristics, 17, 70, 71 
Himalayas climate record, 343 
Historical change 

defined as divergence in similar 
environments, 292-3 
defined as non-stationary, 291-2 
and phylogenetic reconstruction, 310 
product of chaotic dynamics, 303-3 


product of developmental or genetic 
constraints, 298-9 

product of evolution on rough fitness 
topographies, 294-7, 299-300 
product of multiple stable equilibria 
[see also Folk theorem) 300-1 
product of random forces, 293 
slow change requires evolutionary 
explanation, 221 
versus general laws, 283 
Historical linguistics, 284, 310 
wave versus genetic models of linguistic 
evolution, 330 

Historical traces, longevity of, 319-20 
Historical versus scientific explanation, 283, 
288, 290-1, 303-6 
cannot be disentangled as separate 
enterprises, 304 
dichotomy false, 291 
Holism, 320 

Holocene epoch, 67, 339, 340, 348-9, 362 
Housecats, 76 
Huli people, 212 
Human sociobiology 
debate about, 103 

depends upon decision-making forces, 

290, 400 

example of a plausibility argument, 

411-2 

subject of dubious programmatic attacks, 
413 

a successful research program, 416-7 
as theory of utility functions, 395 
useful exploration of the limits of fitness 
optimizing models, 409-10 
Humans, wide range compared to other 
primates, 10 

Human uniqueness, 4, 133 
Hunting and gathering, 273 

frontiers with agriculturalists sometimes 
stabilize, 360 

made efficient use of plants in Holocene, 
348-9 

persisted unusually long in western North 
America, 339 
Pleistocene, 345 

relation to origins of agriculture, 357 
use of plants in Pleistocene, 358-9 
Huron society, 344 
Hybrid zone, 316 

Hypothesis testing versus testing plausibility 
arguments, 410-2 

IBM 370 microprocessor, 295 
Ice cores, 17, 67, 340-1 
Ideal types, 397 
Ilaga Dani people, 212 
Imagined communities, 268 
Imitation. See also Observational 
learning 

allows cumulative improvement, 42-4 
allows selective learning, 40-2 
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capacity for cannot increase when rare, 
59-61 

comparative study in chimpanzees and 
children, 77-8 
definition, 44 

evolutionary equilibrium amount of, 41 
experimental evidence for in non-human 
animals, 56 

must have arisen by natural selection, 7 6 
requires special-purpose cognitive 
machinery, 60 
true, definition, 54-5, 77 
why adaptive, 85-9 
Imitators versus learners, 1 5 

fitnesses in Rogers’s model, 36-40 
Imprinting, 74 

Inclusive fitness. See Kin selection 
Indian caste system, 330 
Indians, of Western North America, 217 
evolution of Plains culture after 
introduction of horses, 219 
Northwest Coast, 262 
Indirect reciprocity, 146, 257, 270-1 
language and, 275 
Individual learning. See Learning, 
individual 

Individual level processes, as not explaining 
historical change, 221 
Indo-European expansion, 333 
Indo-European language reconstruction, 

325 

Information, 3 

costly, 8-9, 17, 379, 391-2, 410, 412 
costs leading to alternative plausibility 
argument, 412 
imperfect, 20, 391-2, 410 
innate, 70 

large reservoirs of information, 423 
noisy, 14 

prestige and conformity biases 
adaptations to uncertainty of, 232 
Inheritance of acquired variation, 1 4 
Inherited habits, 289 

Innate programming versus individual and 
social learning, 75-7 
Institutions 
definition, 253 
evolution of, 260-70 
highly variable, 253-4 
important in human behavior, 253 
product of cultural evolution, 253-5 
tribal scale, 262-3 
Intelligence, 69, 78 

Interdisciplinary study, social sciences as 
insufficient in, 375 
Inuit people, 54 

Irian Jaya, ethnographic data from, 205, 217 

Islam, 329 

Italy 

civic institutions in Northern versus 
Southern, 270 
climate change in, 343 


Jale people, 212 

Japan, spread of agriculture in, 356 

Japanese macaque potato washing in, 55 

Jaqai people, 212 

Jarmo, early farming site at, 338 

Jate people, 212 

Joint stock companies, 239 

Jomon culture, 358-9 

Kalahari, 262, 272, 423 
Kalenjin language group, 217 
Kapauku people, 212 
Kenya, 130 
Kikuyu people, 130 
Kin selection, 105, 146, 189, 257 
example of robust sample model, 410 
Kiwai people, 212 
Kukukuku people, 212 
Kuma people, 212 
Kuria people, 329 
Kuru disease, 216 

Lago Grande de Monticcio core, 343 
Lamarckian inheritance (and evolution) 14, 
290, 400, 409 
Language 

examples of coherence of small cultural 
units, 328 

an index of cultural phylogeny, 320, 324 
linguistic diversity as adaptive barriers to 
communication, 271-2 
role in indirect reciprocity, 275 
Western North American Indians, 217 
Language-technology coevolution, 333 
Last glacial climate, 340-4 
Leadership, 266-7, 269 
Learning, individual, 19, 66, 391-2 
accuracy and cost determines value of 
social learning, 28, 32, 43 
costly, 35 

social learning multiplies power of, 70 
versus social learning and innate 
programming, 75-7, 86 
Learning, social. See Social learning 
Legislature, and group-functional behavior, 
221 

Legitimate institutions, 269-70 
Little Ice Age, 344 
Lizard, 54, 425 

Local enhancement defined, 55, 426-7. See 
also Imitation; Observational learning 
Local population as source of valuable 
information, 100 
Logistic equation, 351 

Ma’a language, 328 

Macaque, 105 

Macarthur Foundation, 249 

Machiavellian intelligence, 273 

Mae Enga people, 209, 219 

Mahale Mountains, chimpanzees in, 426 

Mailu people, 212 
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Maize, 356, 394 
Maladaptations, 9-11, 18 

conformist transmission stabilizes, 192 
maintained by moralistic punishment, 

177 

origin by runaway evolution of symbolic 
systems, 116 

predictable byproduct of cultural 
transmission, 10, 395, 400 
Maladaptations in social arrangements 
result of abuses of elite power, 266 
result of failures of hierarchical 
bureaucracies, 267 

Maladaptive consequences of workarounds 
coercive dominance, 266 
difficulty in creating and maintaining 
trust, 269-70 
segmentary hierarchy, 267 
symbolically marked groups, 268 
Mana, concept of, 217 
Mander cultural area, 216 
Marind-Anim people, 212, 217 
Mariner’s compass, 242 
Maring people, 209-10, 212 
Marker trait, 107 
Markets, 223, 269 
Mass media, 268 

Mathematical models. See also Models 
needed to study population level 
phenomena, 105, 433 
simple models versus complex 
phenomena, 376-7 
Mayans, 325 
Meaning, 320 

Mediterranean climate change in, 345 

Melpa people, 212 

Memes 

critique of, 377, 434 
as mind viruses, 323 
Mendi people, 210 
Mental representations, 57 
Mesoamerica, spread of agriculture in, 356 
Mesolithic societies, 348, 360 
Mexico, 344, 356, 360 
Migration, 141 

and phylogeny, 314 

Wright island and stepping-stone models 
of, 142 

Military, 265, 267 

Millennial and sub-millennial scale climate 
variation, 67-8, 340-4 
Milhngstones, 359 
Mind design, 74, 79 
Models. See also Simple models 
of basic population level processes, 

383 — 4 

of Bayesian decision-maker with access to 
traditional information, 381-93 
of biased imitation, 388-9 
as caricatures, 404 
complex critiqued, 377, 402-3 
continuous versus discrete, 57 


INDEX 


of cooperation and punishment with 
conformity and group selection, 
192-202 

of cultural evolution, 105-6 
of cultural recombination at group level, 
236-8 

of cumulative learning, 49-50 
of dispersal, 352-3 

of dynamics of innovation, 353-5, 363-4 
of evolutionarily stable amount of 
tradition, 384-8 

of evolution of conformity, 89-90 
of evolution of ethnic markers, 106-14, 
119-28 

of evolution of reciprocal cooperation, 

146-60, 162-3 

of evolution of reciprocity with 
retribution [punishment) 170-7, 
179-86 

of evolution of social learning, 56-64 
of fast form of group selection, 91-5 
of gene-culture coevolution, 199-200 
of heterogeneous environments, 386-8 
of how group beneficial equilibria spread, 
231-6 

of individual and social learning, 21-32 
of learning and imitation by Rogers, 36-7 
of learning and imitation in variable 
environment, 45-9 

of natural selection (on cultural variation) 

389-91 

of population dynamics, 363 
of population pressure with diffusion and 
innovation, 351-5 

of replicator dynamics in a structured 
population, 229-38 
of simulation of evolution of altruistic 
punishment, 243-8 
strategy for addressing controversial 
questions, 412 

tradeoffs between generality, realism, and 
accuracy in constructing, 405 
utility of simple, 403-4 
verbal unreliable, 377, 405 
Modern societies versus small-scale 
societies, 264-5 

Modular organization of cognition, 69-70, 
73-4 

Monkey, 55 

Moralistic punishment, 91, 134, 138-40, 
167, 248 

stabilizes anything, 176-9 
Moral systems, 84 
Mormon norms, 84 
Multilevel nature of evolution, 256-8 
Multiple stable equilibria, 261. See also Folk 
theorem 
Myth, 217 

NaDene peoples, 325 
Naidjbeedj cultural area, 216 
Naked mole rat, 189 
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Natufian culture, 348, 350-1, 357, 359 
Natural selection 

abstract category, 255 
acting on cultural variation, 10, 259, 289, 
392, 400 

conditions to favor increased reliance on 
social learning, 20-33 
generally favors small brains, 68 
shapes learning rules adaptively, 20 
Nature-nurture dichotomy, 8 
controversy confused, 413 
Navaho people, 324 
Neanderthals, 272 
Near East, 338-9, 356-8 
Neo-Darwinian synthesis, 315 
Neolithic societies, 348, 357, 360 
New group formation, as important to 
cultural group selection, 211-3 
New Guinea, 143, 205, 217-8, 261, 267, 
323, 330, 356 
South Coast, 217 

New World, isolation of cultures in, 334 
New Zealand, population build-up in, 359 
Nilotic peoples, 328-9 
Non-human animals, culture of, 425-7 
Non-parental transmission fitness costs and 
benefits, 401 

Non-stationary time series, 291-2 
Norms, 119 

Athapaskan as an example of cultural 
persistence, 321 
definition, 84 

functional versus dysfunctional, 228 
group beneficial, 227-8, 238 
help people make good decisions cheaply, 
83-4 

persistence explained, 227-8 
North America, spread of agriculture in, 338 
North China, spread of agriculture in, 338, 
356 

Northwest Europe, 356 
Nuer people, 221, 261, 273 
Numic languages, 333 

Observational learning. See also Imitation 
critical to cumulative cultural evolution 
in humans, 427 
defined, 54-5, 426-7 
limited to humans, 55 
requires special psychological 
mechanisms, 56 
Octopus, 52 

Ohalo II archaeological site, 358 
Ok people, 212, 213-4, 323-4 
Orangutan, 56 

Paleolithic societies, 262 
Palio (horse race of Siena), 267 
Papago people, 425 
Parrot, 79 

Path dependence, 292 

Pavlov reciprocating strategy, 136 


Phenotypic flexibility, 67 
Phylogenetic reconstruction in biology 
classification, 310 
detection of constraints, 311 
inferences about history, 310 
Phylogenetic reconstruction of cultures. 

See also Descent in cultural evolution 
comparison of core and small unit 
hypotheses, 330 
core traditions: evidence, 322-6 
cultures as wholes: evidence for, 322 
partial phylogenies and the study of 
adaptation, 332-3 
why important, 332 
Pigeon, 52, 70 

Plant intensive subsistence systems, 345-6 
Plausibility arguments vers us conventional 
hypothesis testing, 410-2 
human sociobiology as example of, 411 
versus programmatic attacks, 413 
Pleistocene climates, 16, 74, 143 
climate seasonality, 365 
deterioration of, 67-8 
hunter-gathers under, 345 
millennial and sub-millennial scale 
variation, 340-4 
role in cognitive evolution, 66 
role in deterring agriculture, 338, 339-44 
Pleistocene epoch, 67-8, 354-5, 362 
Poland, Solidarity movement in, 145 
Police, 265-6 
Pollen record, 345 
Polynesia, 217, 325, 359 
ranked lineage system, 266 
Popperian falsificationism improper 
epistemology for ecology and 
evolution, 411 
Population growth 

has wrong time scale to explain origins of 
agriculture, 351 

limited by growth of subsistence, 354 
Population level properties 

and complex cultural traditions, 16 
of culture, 8-10 

implications for cognitive psychology, 

432 

linkage to individual level, 110 
necessary to explain rates of historical 
change, 221 

similar in the cases of genes and culture, 
105, 289, 378 

of social learning, 20 
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necessary to understand culture, 421 
Pottery, 359 
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Prestige and charisma, 267 



SUBJECT 


455 


INDEX 


Prestige systems, 15. See also Biased 
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Punishment, 84, 189 
altruistic, 169, 241-9 
cooperation favored by, 246-8 
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theory) 5-6, 84, 190, 238, 252, 273 
critique of, 379-80 

incomplete without theory of tradition, 
391, 393 

second generation bounded, 254 
Rational planning, 223 
Rational self-interest, as failing to explain 
experimental data, 145 
Reciprocity, 134, 135-41, 189 
effects of kin selection in pairs versus 
larger groups, 1 59-60 
evolution in large groups, 137-41, 146, 
148-60, 168-9 

evolution in pairs, 147-9, 166, 167-8 
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288, 290-1, 303-6 
cannot be disentangled as separate 
enterprises, 304 
dichotomy false, 29 
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critique of, 397 
empirical tests of, 406 
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students of, 107 
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programming, 75-7, 86 
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413- 4 
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Tierra del Fuego, 272 
Tiriki people, 328 
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Toolkit of models as theory, 376-7, 408 
Tor peoples, 211, 212, 216-7 
Tradition. See also Culture 

acts hke a system of inheritance, 394 
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